[Libre-soc-dev] sv.mv x: the instruction from hell

Sat Jun 4 09:44:16 BST 2022

* instead of LD-sequential followed by sv.mv.x just do index remap on the LD
* the element indices could overlap i.e. be overwritten by a previous element indexed-mv
   a rule could be set "undefined behaviour" but it is no different
  from setting the rule "ignore hazards" on indexed-remap
* the big difference however is that a bad instruction (as scalar)
   is made even worse by that undefined behaviour rule.

to add mv.x it would have to be proposed:

    "please accept this 32 bit scalar instruction that nobody in their
      right mind would ever use because it creates catastrophic
      read/write hazards.  oh and even when vectorised we have
      to propose undefined behaviour"

vs:

     "please accept this slightly less optimal solution which can
       save register resources, hides the undefined behaviour
       behind an index-remap instruction, and doesn't need a
       32-bit instruction from hell (mv.x) to accompany it"

the latter is clearly a much cleaner proposal.

possiblilities:

    svremap.indexed
    sv.extsw

or

    svremap.indexed
    sv.fmv (or any other mv instruction including converters)

or

     svremap.indexed RA
     sv.ld

or:

     svremap.indexed RB
     svremap.indexed RS
     sv.ld

or:

      svremap.indexed RA
      sv.addi RT.v RA.v, 5

the use of *double* indexing is err where it gets fun/hilarious/obtuse.  it's even technically possible to do this:

      svremap.indexed RA
      svremap.indexed RB
      svremap.indexed RT
      sv.add RT.v RA.v, RB.v

it becomes a "get out of jail free" card for any types of operations which are more complex than the current hardware-for-loop remaps (matrix, DCT, FFT, which took 8 weeks to do)

need a triangular remap? no problem, pre-create the indices and use them as offsets with remap.indexed.

bottom line is, remap.indexed fits much better with the SV paradigm, because it abstracts out the *concept* of indexing.

oh.  i just realised.  we would also need to propose fmv.x not just mv.x and it would suffer the exact same flaws.  whereas remap.indexed can apply to *all* the GPR<->FPR interchangers.

there's no way i would be comfortable proposing a faulty unusable suite of scalar fmv.x and associated GPR-FPR{.x} instructions.

does that help explain?

l.