[Libre-soc-isa] [Bug 560] big-endian little-endian SV regfile layout idea
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Tue Jan 5 20:38:20 GMT 2021
https://bugs.libre-soc.org/show_bug.cgi?id=560
--- Comment #90 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #88)
> (In reply to Jacob Lifshay from comment #86)
>
> > I'll create some illustrations today.
Done, with lots of pretty colors!
https://libre-soc.org/openpower/sv/byteswap/
> this will help.
>
> > > now i have written them out those are the same thing. i thought that one of
> > > them might involve moing the dynamic byteswapping to be part of
> > > MultiCompUnit, where the quantity of gates gets multiplied even more than it
> > > already is.
> >
> > The byte-swapping would be a pipeline stage right after the mux for
> > selecting which FUs to execute,
>
> there are going to be at least QTY 50 (fifty) 64 bit regfile ports, each
> crossbar being around 2k gates that's 100,000 gates if placed at the regfile
> ports.
As shown in the illustration linked above, it works just fine with 5*64=320
gates per 64-bit byte-swapper, waay less than 2k gates.
The latest proposal doesn't have anything added to reg-file ports, we'll just
trap and have SW handle 64-bit byte-swapping all the int/fp registers when
changing the CPU between BE/LE modes.
Byte-swapping only occurs in ALUs/load/store -- anywhere where a instruction
operates on int/fp registers. All other registers (CRs and SPRs mostly) don't
need byte swapping since they are not byte-addressable and are just kept in LE
mode permanently.
> there are going to be around... 60 to 80 FUs (most of those "laned") which
> means around 4x60 64 bit src operands plus another 60 dest ports. 5x60 =
> 300 operand ports.
The mux goes *not at the FUs* but at the ALU after the operands are muxed in
from the different FUs. So, just counting the 128-bit SIMD mul-add ALU (since
I'm not 100% sure about the full list of ALUs/load/store/etc. we will have), it
will be 3 inputs 1 output at 128-bit width, so that's 4 io * 5 gates * 128 bits
= 2560 more gates for the whole ALU. Waay less than you expected. I expect the
number of additional gates required for the whole core to be on the order of
10-20k.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Libre-SOC-ISA
mailing list