[Libre-soc-isa] [Bug 535] setvl/setvli encoding & future reg file expansion
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Tue Dec 1 18:37:20 GMT 2020
--- Comment #8 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #7)
> did you notice that setvl writes to an output register?
yes. optionally set MVL, optionally set VL, optionally set RT.
i figured, "why the heck not".
the write to the RT register, when sourced from RA, is how loops are
[at one point i advocated "stuff having a separate VL SPR, actually mark one of
the INT regs *as* VL" however the complexity at the backend, with the so-marked
GPR effectively becoming a Read Hazard to EVERY instruction... this was too
much to think through. it would be beautiful, though: setvl goes from the
inner loop at the least].
> I should clarify, I only meant removing the mvl register, not the mvl
> immediate field or the calculation using mvl.
ah ok. see below, i added c.setvl and c.setmvli
> > > mtspr could be entirely equivalent to `setvl 0, ra, 64` or just not
> > > supported (always traps no matter privilege mode) for writing VL.
> > no, use of mtspr is definitely not equivalent because that cuts out the
> > setting of RT.
> I didn't say mtspr was equivalent to all setvl instructions, just that
> specific one which opts out of writing RT by setting the register field to 0.
that simply does not put VL into RT. VL is still set... just not transferred
this is for circumstances where there is no loop, but a (single) sequence of
Vector Ops would save a huge number of instructions.
common circumstances for this include entry points to functions, where a long
contiguous run of registers needs to be stored on the stack.
the history of computing ISAs as you are no doubt aware is littered with Bad
Examples Of How Not To Do That. even OpenPOWER has had the good sense to
retire load/store-multi, and ARM retired the same (all but the 2-reg variants)
there will be plenty more like that, including accidental occurrences of
sequential use of registers in uniform-looking structs.
a loop would be inappropriate, a "normal" vector regfile wouldn't work, however
SV by a complete coincidence has what's needed.
and for these one-offs, reading VL or RA, or writing VL to RT, these are not
all you want is:
v.ld ra, rb # load 5 regs from 5 adrs
v.mv ra, rb # copy 5 regs
> > the thing is that setvl/i is actually quite complex. it took me several
> > weeks to understand it fully in RVV, and then even longer to realise and
> > accept that the capabilities of RVV setvl were needed (in full)... *and in
> > addition* the ability to set MVL at the same time was required.
> > the critical, critical part of setvl is this:
> > RT = VL = min(min(VL, MAXVL), RA)
> yup, that's completely retained. what isn't needed is keeping the maxvl
> value around in a spr for later,
i think i get what you are saying. it is: if the only location where MVL is
specified is in the setvli instruction, if the only place it is ever used is in
this instruction, then it effectively becomes local state and need not be
stored in an SPR.
the answer to that is in the form of the two Compressed instructions i added
* setmvli immed
* setvl rt, ra
by a nice coincidence there happened to be a non-register immediate spare slot
with 6 bits free for an immediate, and another spare slot in the 16 bit logical
this covers the majority use-cases: setting MVL outside the loop (16bit),
setting VL inside the loop (16bit).
if MVL has been set to a fixed quantity for several loops (start of a function)
16 bits are saved by way of splitting MVL setting from VL setting.
> since the compiler knows its value as a
> compile-time constant (it has to since it allocated the registers) and can
> just put maxvl in the immediate of all relevant setvl[i] instructions.
> We can even include setting CR0 (if Rc = 1) to allow jumps on VL == 0
> immediately after setvl.
awesome, isn't it? :) i love CRs. see comment #4 i put Rc support in. i
mean, it's part of XO-form so why not.
> For setvli, since the value VL is set to is a constant, having a destination
> register is much less important.
no, it's critical. without it, loops are forced to read VL after the setvl
this increases inner loop overhead by one instruction. given that some vector
loops will only be 5-6 ops this is a whopping 15-20% increase.
setvl VL=min(MVL,r5) # without vl
# into dest
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Libre-SOC-ISA