[Libre-soc-isa] [Bug 213] SimpleV Standard writeup needed

Mon Oct 19 12:54:27 BST 2020

https://bugs.libre-soc.org/show_bug.cgi?id=213

--- Comment #61 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
darn it, this might be partly be why the traditional cray vector system wastes
an entire vector element as a predicate, throwing away all of the top bits.  by
"wasting" an ALU doing a single bit operation it has the side-effect of
reducing latency.

if the predicate is treated as a single dependency hazard, then yes it becomes
a bottleneck.

no huge surprise there, just took me a while to "get".

all solutions no matter how implemented and no matter the microarchitecture
involve breaking the predicate down into element-sized chunks that whatever
Hazard-tracking is on that microarchitecture can use to get element
interleaving.

we *might* be able to treat the scalar int as not a scalar at all but as a
"vector of 8 bit integers".

around 18 months ago i came up with a scheme where extra 8 bit ALUs were added
to be able to cope with weird small non-power-of-2 Vector Lengths.

the DMs had extra cascading "tree" logic (as extra rows) where if you used a
reg for these obtuse 8 bit vector operations they marked the *main* 64 bit
FU-REGS DepMatrix Hazard flag *and* marked the relevant 8-bit chunks.

thus you could get overlapping 8 bit operations on *different parts* of a 64
bit register, and because we have byte-level write-enable (8 of them on the 64
bit regfile) there is no RD-MOFIFY-WR problem.

plus the "cascade" means that 64 bit reg is protected from being corrupted if
needed to be accessed as a 64 bit reg instead of as an 8x8bit vector.

it's pretty horrendously complicated which is the reason i didn't rave about it
because it's an optimisation.

or so i thought.

well, it still is, but it looks like it'll be a pretty important one.

the only thing is, it's no good doing those predicate calculations at the 8 bit
level if you then go and *read* the damn predicate as a single 64 bit scalar
op.  that defeats the entire exercise!

the instruction issue engine would have to issue 8x 8bit reads, *not* issue 1x
64 bit read.

i think that's doable.  i.e the predicate system would hook into the exact same
8bit DM logic that the scalar 8bit ops just used.

argh :)

-- 
You are receiving this mail because:
You are on the CC list for the bug.