[Libre-soc-isa] [Bug 213] SimpleV Standard writeup needed

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Tue Oct 20 16:51:08 BST 2020


https://bugs.libre-soc.org/show_bug.cgi?id=213

--- Comment #83 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
jacob although earlier i said that the concept of lanes doesn't exist, there is
actually a physical way to do "lanes" in the original Cray sense, by reducing
both the FU-REGs and FU-FU Dependency Matrices to sparse matrices.  feel free
to add more example layouts to this page btw (so we can visualise proposed
designs):

https://libre-soc.org/openpower/sv/example_dep_matrices/

basically how that works is, whilst each ALU is still 64-bit SIMD capable the
basic assumption is that vector operations are issued on "aligned" register
boundaries starting at multiples of some arbitrarily-determined amount (4 in
the example).

thus the following is "valid":

* VL=12
* ADD.64 r0 <- r16, r32

where the following is NOT:

* VL=12
* ADD.64 r0 <- r17, r25

because 17-0 is not a multiple of 4, and neither is 25-17

this latter vector-instruction would be done by the *scalar* ALUs (S-ALU1,
S-ALU2, S-LOGIC1, S-LOGIC2).

note that all ALUs are *still scalar*, it's just that - and this is the
important bit - the instruction issue engine *never issues illegal combinations
for which there does not exist a Dependency Matrix cell*

note also the following:

* there is full FU_Regs DM cell coverage for FUs marked "S"
* there is full FU-FU DM cell coverage for FUs marked "S"
* there is full FU-Regs and FU-DM cell coverage for FUs marked "L0"
* there is full FU-Regs and FU-DM cell coverage for FUs marked "L1"
* there is full FU-Regs and FU-DM cell coverage for FUs marked "L2"
* there is full FU-Regs and FU-DM cell coverage for FUs marked "L3"

what i am not sure about is whether to add DM cells inter V-S.  this would
allow at least some operations such as the following to be done:

* ADD.64 r7 <- r16, r32

because whilst the src1 and src2 (r16, r32) can be allocated across Lanes 0-3,
the destination would otherwise have to be stored in the regfile, if there is
another operation that needs r7, *ALL* operations would entirely stall until
that result had been written in r7, and only then could the new Read Hazard be
created.

if however no such follow-up operation needing to read r7 is issued then no
stall would be needed.

if however we leave those inter V-S cells blank, then such operations as "ADD
r7<-r16,r32" would need to be done in the "S" FUs, jamming them up by mixing
amongst scalar operations.

the reduction in the number of gates needed in the DMs is... massive.  also
there is a massive reduction in the wiring/routing needed between regfiles. 
i.e. the regfiles can also be stratified along similar "lanes" arrangements.

i know.  it's horrendously complex.  we've literally got to invent terminology
as we go along.  i have no idea what to call this.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libre-SOC-ISA mailing list