[Libre-soc-dev] sv.mv x: the instruction from hell

Sun Jun 5 08:52:02 BST 2022

On Sat, Jun 4, 2022, 01:44 lkcl <luke.leighton at gmail.com> wrote:

> * instead of LD-sequential followed by sv.mv.x just do index remap on the
> LD
>

compilers already do that...(i'd expect) -- they'd use sv.ld.x.

* the element indices could overlap i.e. be overwritten by a previous
> element indexed-mv
> a rule could be set "undefined behaviour" but it is no different
> from setting the rule "ignore hazards" on indexed-remap
> * the big difference however is that a bad instruction (as scalar)
> is made even worse by that undefined behaviour rule.
>
> to add mv.x it would have to be proposed:
>
> "please accept this 32 bit scalar instruction that nobody in their
> right mind would ever use because it creates catastrophic
> read/write hazards. oh and even when vectorised we have
> to propose undefined behaviour"
>

undefined behavior isn't needed, all you need is for the normal operation
to not have the input/output data vectors not overlap. when they do
overlap, you can always run in slow mode or trap and emulate (overlapping
is trivial to detect as soon as you know VL, trapping can be done in the SV
decoder).

>
> vs:
>
> "please accept this slightly less optimal solution which can
> save register resources, hides the undefined behaviour
> behind an index-remap instruction, and doesn't need a
> 32-bit instruction from hell (mv.x) to accompany it"
>
> the latter is clearly a much cleaner proposal.
>
> possiblilities:
>
> svremap.indexed
> sv.extsw
>
> or
>
> svremap.indexed
> sv.fmv (or any other mv instruction including converters)
>
>
> or
>
> svremap.indexed RA
> sv.ld
>
> or:
>
> svremap.indexed RB
> svremap.indexed RS
> sv.ld
>
> or:
>
> svremap.indexed RA
> sv.addi RT.v RA.v, 5
>
> the use of *double* indexing is err where it gets fun/hilarious/obtuse.
> it's even technically possible to do this:
>
> svremap.indexed RA
> svremap.indexed RB
> svremap.indexed RT
> sv.add RT.v RA.v, RB.v
>

you realize that means we need *3* (or 4 for sv.fmadd) sets of the hw
implementation of mv.x?! (assuming you don't just split that into 4
microops, 3 mv.x and an add) imho that's terrible!

>
> it becomes a "get out of jail free" card for any types of operations which
> are more complex than the current hardware-for-loop remaps (matrix, DCT,
> FFT, which took 8 weeks to do)
>
> need a triangular remap? no problem, pre-create the indices and use them
> as offsets with remap.indexed.
>
> bottom line is, remap.indexed fits much better with the SV paradigm,
> because it abstracts out the *concept* of indexing.
>
> oh. i just realised. we would also need to propose fmv.x not just mv.x and
> it would suffer the exact same flaws.

fmv.x would be just as easy to define and implement as mv.x.

whereas remap.indexed can apply to *all* the GPR<->FPR interchangers.
>
> there's no way i would be comfortable proposing a faulty unusable suite of
> scalar fmv.x and associated GPR-FPR{.x} instructions.
>

gpr-fpr.x instructions are unnecessary, since int/fp moves are quite
uncommon unless you're also doing int-fp conversion, which is a separate op
than mv.x.

Jacob