[Libre-soc-isa] [Bug 213] SimpleV Standard writeup needed

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Tue Oct 20 00:12:13 BST 2020


https://bugs.libre-soc.org/show_bug.cgi?id=213

--- Comment #75 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #72)

> All you need is a bitwise right shifter to send the next 8 bits from the
> vector mask to the ALU,

jacob you're not quite getting it: this is only possible to do ("a simple
shift") if there are no Dependency Matrices involved.

a simple architecture such as microwatt can do such a shift trick.  an in-order
system likewise.

an OoO system MUST track ALL objects regardless of size.

the significance of this had not really sunk in properly for me because i had
not realised the latency problem you highlighted.

we have two choices at each end of the spectrum (and some in between)

* bitlevel predicate Dependency Matrices: one bit per element
* "one hit" (one scalar) predicate masks (with associated latency)

when doing bitlevel DMs one optimisation in the VL instruction issue phase is
to notice the following:


* VL=16
* elwidth=16
* SIMD width=64
* therefore 4x ops can be batched to each ALU

*BUT*

to do that, you need 4 bits of predicate i.e. 4 predicate regs to be passed to
those ALUs.

now, if you start having to get those 4 bits (which can't do the shifting you
suggest *because they haven't been read yet*) it quickly becomes hell.

note that DMs track regs *before the contents are available*.  we don't *have*
the contents of the predicate mask available at the time in order to be able to
shift it!

consequently you have to do that shadow trick, and only when the reg is read
*then* you can finish off the bitlevel analysis (shifting if necessary) and
send it on to each ALU.

even having an internal PRF ARF special designation: the protection needed, i
did try once the idea of making VL a pointer to a reg rather than an immediate,
and hoo-boy was it convoluted.


you need to think through: what is the logic needed to implement 8-bit vector
mask *when you do not have access to the mask yet*, how will the mask get into
the Shadow Matrices, and how does it work for all possible elwidths and all
possible values of VL.

basically it's *nowhere* near as "simple" as "a shifter".

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libre-SOC-ISA mailing list