[Libre-soc-isa] [Bug 560] big-endian little-endian SV regfile layout idea

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Tue Jan 5 07:28:56 GMT 2021


https://bugs.libre-soc.org/show_bug.cgi?id=560

--- Comment #81 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #68)
> (In reply to Jacob Lifshay from comment #62)
> 
> > Note that the HW implementation I proposed in comment #55 would require a
> > 5-input mux on ALU pipelines inputs/outputs and a 2-input mux on register
> > R/W ports.
> 
> no, this is plain wrong, and is misleading alexandre who is not familiar
> with gate level design and assessment.
> 
> you neglected to mention that those muxes are 64 bits wide.  consequently he
> believes that the gate count is only 5.

I explicitly mentioned that the muxes take 5*64 gates in comment #63:
> I think the 5-input mux is a far cry from the "full 8-in 8-out crossbar" that
> you were afraid of needing. it would be 6*64 gates (4 2-in NAND and 1 5-in
> NAND) per 64-bit input/output with a 2 gate delay. I'd expect that to be
> small enough to be doable.


> we are designing Dynamic Partitioned SIMD and consequently the 64 bit SIMD
> must now  have all possible permutations of byteswapping at the regfile port.
> 
> to cover all possibilities we must first enumerate those possibilities. 
> they are:
> 
> * 8 8 8 8 8 8 8 8
> * 16 8 8 8 8 8
> * 8 16 8 8 8
> * ....
> * 16 16 ... 
> * 24 8 8 8 ..
> * 8 8 ... 16 8..
> 
> finally at long last you get to 1x 64 bit. 
> total: 128 combinations
> 
> to cover all of these REQUIRES a full 8x8 crossbar.

Yes, however, what happens if you only issue the same-sized elements to any one
ALU each cycle:
8 8 8 8 8 8 8 8
16 16 16 16
32 32
64
but not:
8 8 16 32
or other non-uniform combinations.

At that point, the 5-input mux is sufficient (actually, only 4-inputs needed),
since there's one input for not byte swapped, 1 for 16-bit byte-swapped, 1 for
32-bit byte-swapped, and 1 for 64-bit byte-swapped.

Since the most common vectors are 64-bits or longer, forcing the ALUs to only
process same-sized elements per cycle is a reasonable tradeoff.

Note that that doesn't mean we have to wait for the 32-bit ops to finish making
it through the pipline before we can issue 8-bit ops, they can be changed every
cycle.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libre-SOC-ISA mailing list