[Libre-soc-isa] [Bug 560] big-endian little-endian SV regfile layout idea

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Tue Jan 5 20:38:20 GMT 2021


https://bugs.libre-soc.org/show_bug.cgi?id=560

--- Comment #90 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #88)
> (In reply to Jacob Lifshay from comment #86)
> 
> > I'll create some illustrations today.

Done, with lots of pretty colors!

https://libre-soc.org/openpower/sv/byteswap/

> this will help.
>  
> > > now i have written them out those are the same thing.  i thought that one of
> > > them might involve moing the dynamic byteswapping to be part of
> > > MultiCompUnit, where the quantity of gates gets multiplied even more than it
> > > already is.
> > 
> > The byte-swapping would be a pipeline stage right after the mux for
> > selecting which FUs to execute,
> 
> there are going to be at least QTY 50 (fifty) 64 bit regfile ports, each
> crossbar being around 2k gates that's 100,000 gates if placed at the regfile
> ports.

As shown in the illustration linked above, it works just fine with 5*64=320
gates per 64-bit byte-swapper, waay less than 2k gates.

The latest proposal doesn't have anything added to reg-file ports, we'll just
trap and have SW handle 64-bit byte-swapping all the int/fp registers when
changing the CPU between BE/LE modes.

Byte-swapping only occurs in ALUs/load/store -- anywhere where a instruction
operates on int/fp registers. All other registers (CRs and SPRs mostly) don't
need byte swapping since they are not byte-addressable and are just kept in LE
mode permanently.

> there are going to be around... 60 to 80 FUs (most of those "laned") which
> means around 4x60 64 bit src operands plus another 60 dest ports.  5x60 =
> 300 operand ports.

The mux goes *not at the FUs* but at the ALU after the operands are muxed in
from the different FUs. So, just counting the 128-bit SIMD mul-add ALU (since
I'm not 100% sure about the full list of ALUs/load/store/etc. we will have), it
will be 3 inputs 1 output at 128-bit width, so that's 4 io * 5 gates * 128 bits
= 2560 more gates for the whole ALU. Waay less than you expected. I expect the
number of additional gates required for the whole core to be on the order of
10-20k.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libre-SOC-ISA mailing list