[Libre-soc-dev] [SVP64] feedback needed - Pack/Unpack (vpack/vunpack)

Sat Jul 30 19:43:39 BST 2022

https://bugs.libre-soc.org/show_bug.cgi?id=871#c0

as you know SVP64 applies "schedulees" in the abstract, not really
knowing or caring exactly what operation the for-loop is being applied
to.

Pack/Unpack operates on vec2/3/4 performing a transpose. assume elements
vec3 numbered v0 v1 v2, QTY 2

  0.v0 0.v1 0.v2
  1.v0 1.v1 1.v2

when "packed" will result in

  0.v0 1.v0
  0.v1 1.v1
  0.v2 1.v2

and thus elements in this order:

  0.v0 0.v1 0.v2  1.v0 1.v1 1.v2

will be "repacked" to:

  0.v0 1.v0 0.v1 1.v1 0.v2 1.v2

whilst it is blindingly obvious to let repacking be applied to LD/ST immediate,
other operations are not so obvious. i've created a candidate list at the top
comment.

also obvious is fmv (and ori which is an alias for GPR mv)

my intuition eems to be steering clear of mulli, anything with shift, popcnt,
cmpi, frsqrt, fre, anything involving comprehensive changes to the result.

but things like subfeo, neg, nego, fneg, fcvt and so on, i seem to feel
these are okay, because the output is directly related to input in some
way.

bear in mind, the caveat here is that these would change from EXTRA3
to EXTRA2, losing exactly 50% of starting vector points!  they can only
start 0 2 4 6 8..... where previously they could start from 0 1 2 3 4....

this in turn means fixing a lot of unit tests which could no longer do
sv.extsw *5, *7 instead it would have to be sv.extsw *4, *8

i have no problem restricting to a much smaller subset, ori, fmv, extsw
etc. it is just such a big strategic change.

l.

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68