[Libre-soc-isa] [Bug 560] big-endian little-endian SV regfile layout idea
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Tue Jan 5 07:28:56 GMT 2021
https://bugs.libre-soc.org/show_bug.cgi?id=560
--- Comment #81 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #68)
> (In reply to Jacob Lifshay from comment #62)
>
> > Note that the HW implementation I proposed in comment #55 would require a
> > 5-input mux on ALU pipelines inputs/outputs and a 2-input mux on register
> > R/W ports.
>
> no, this is plain wrong, and is misleading alexandre who is not familiar
> with gate level design and assessment.
>
> you neglected to mention that those muxes are 64 bits wide. consequently he
> believes that the gate count is only 5.
I explicitly mentioned that the muxes take 5*64 gates in comment #63:
> I think the 5-input mux is a far cry from the "full 8-in 8-out crossbar" that
> you were afraid of needing. it would be 6*64 gates (4 2-in NAND and 1 5-in
> NAND) per 64-bit input/output with a 2 gate delay. I'd expect that to be
> small enough to be doable.
> we are designing Dynamic Partitioned SIMD and consequently the 64 bit SIMD
> must now have all possible permutations of byteswapping at the regfile port.
>
> to cover all possibilities we must first enumerate those possibilities.
> they are:
>
> * 8 8 8 8 8 8 8 8
> * 16 8 8 8 8 8
> * 8 16 8 8 8
> * ....
> * 16 16 ...
> * 24 8 8 8 ..
> * 8 8 ... 16 8..
>
> finally at long last you get to 1x 64 bit.
> total: 128 combinations
>
> to cover all of these REQUIRES a full 8x8 crossbar.
Yes, however, what happens if you only issue the same-sized elements to any one
ALU each cycle:
8 8 8 8 8 8 8 8
16 16 16 16
32 32
64
but not:
8 8 16 32
or other non-uniform combinations.
At that point, the 5-input mux is sufficient (actually, only 4-inputs needed),
since there's one input for not byte swapped, 1 for 16-bit byte-swapped, 1 for
32-bit byte-swapped, and 1 for 64-bit byte-swapped.
Since the most common vectors are 64-bits or longer, forcing the ALUs to only
process same-sized elements per cycle is a reasonable tradeoff.
Note that that doesn't mean we have to wait for the 32-bit ops to finish making
it through the pipline before we can issue 8-bit ops, they can be changed every
cycle.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Libre-SOC-ISA
mailing list