[Libre-soc-isa] [Bug 213] SimpleV Standard writeup needed
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Mon Oct 19 12:54:27 BST 2020
https://bugs.libre-soc.org/show_bug.cgi?id=213
--- Comment #61 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
darn it, this might be partly be why the traditional cray vector system wastes
an entire vector element as a predicate, throwing away all of the top bits. by
"wasting" an ALU doing a single bit operation it has the side-effect of
reducing latency.
if the predicate is treated as a single dependency hazard, then yes it becomes
a bottleneck.
no huge surprise there, just took me a while to "get".
all solutions no matter how implemented and no matter the microarchitecture
involve breaking the predicate down into element-sized chunks that whatever
Hazard-tracking is on that microarchitecture can use to get element
interleaving.
we *might* be able to treat the scalar int as not a scalar at all but as a
"vector of 8 bit integers".
around 18 months ago i came up with a scheme where extra 8 bit ALUs were added
to be able to cope with weird small non-power-of-2 Vector Lengths.
the DMs had extra cascading "tree" logic (as extra rows) where if you used a
reg for these obtuse 8 bit vector operations they marked the *main* 64 bit
FU-REGS DepMatrix Hazard flag *and* marked the relevant 8-bit chunks.
thus you could get overlapping 8 bit operations on *different parts* of a 64
bit register, and because we have byte-level write-enable (8 of them on the 64
bit regfile) there is no RD-MOFIFY-WR problem.
plus the "cascade" means that 64 bit reg is protected from being corrupted if
needed to be accessed as a 64 bit reg instead of as an 8x8bit vector.
it's pretty horrendously complicated which is the reason i didn't rave about it
because it's an optimisation.
or so i thought.
well, it still is, but it looks like it'll be a pretty important one.
the only thing is, it's no good doing those predicate calculations at the 8 bit
level if you then go and *read* the damn predicate as a single 64 bit scalar
op. that defeats the entire exercise!
the instruction issue engine would have to issue 8x 8bit reads, *not* issue 1x
64 bit read.
i think that's doable. i.e the predicate system would hook into the exact same
8bit DM logic that the scalar 8bit ops just used.
argh :)
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Libre-SOC-ISA
mailing list