[Libre-soc-dev] mv.zip and mv.unzip (vector pack and unpack)

Fri Jun 10 13:48:02 BST 2022

https://libre-soc.org/openpower/sv/mv.vec/

as you've probably appreciated i've been diving into the spec pages, particularly with a view, as with mv.x, as to the impact on opcode space as well as the amount of time needed for the OPF ISA WG and how much "goodwill" we can afford to lose by "dumping instructions at them and expecting them to like it".

mv.vec, on closer inspection from when it was originally written, is extremely costly.  4-operand, and, frankly, it is basically a mess.  major opcodes are already under huge pressure in Power ISA, we want 25% of EXT001 and are recommending taking *two* major opcodes with large-immediate bitmanip.

just as with mv.x being eliminated and implemented with exactly the same functionality as mv.x but as a REMAP instead, in looking at mv.zip/unzip, i am finding that Matrix REMAP can actually do the job with a hell of a lot more flexibility.

two examples:

Example 1:

* RA set to linear
* RT set to YX, ydim=2, xdim=4
* VL=MAXVL=8

The indices match up as follows:

    | RA | (0 1) (2 3) (4 5) (6 7) |
    | RT |   0 2 4 8     1 3 5 7   |

This results in a 2-element "unpack"

Example 2:

* RT set to linear
* RT set to YX, ydim=3, xdim=3
* VL=MAXVL=9

The indices match up as follows:

    | RA |  0 1 2   3 4 5   6 7 8  |
    | RT | (0 3 6) (1 4 7) (2 5 8) |

This results in a 3-element "pack"

the downside of this approach is that the two "sources" (or three or more sources) or "destinations" (for unpacking) must be next to each other.  in *theory* this is possible:

* RT set to linear
* RT set to YX, ydim=3, xdim=3
* VL=MAXVL=6

The indices match up as follows:

    | RA |  0 1 2   3 4 5             |
    | RT | (0 3  .) (1 4 .) (2 5 .) |

and with appropriate futzing about, MAXVL and xdim can be "arranged" to step over areas of the regfile that need to be untouched:

* RT set to linear
* RT set to YX, ydim=3, xdim=5
* VL=MAXVL=6

The indices match up as follows:

    | RA |  0 1 2   3 4 5                         |
    | RT | (0 3  . . .) (1 4 . . .) (2 5 . . .) |

admittedly, when ydim=2 this makes much more sense, because there are only two groups being packed.  ydim=3 there is *two* gaps between the results, each gap being uniform in size.

there are 4 bits spare in the "svshape" instruction which can be used to specify alternative modes of operation, here.  a couple of options to organise/overload/abuse Matrix Mode for pack/unpack i believe is essential to keeping away from adding more instructions when it is clear that the ISA WG may start to reject what we are doing.

l.