[Libre-soc-dev] cray-style vector of 40 years setting VL=0 at runtime

Sun Oct 2 17:41:27 BST 2022

On Sun, Oct 2, 2022 at 4:49 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
> I completely understand the need for VL=0 for *vector* operations (defined as
> those operations where at least one operand potentially accesses multiple
> elements/subvectors), I'm complaining about the need to set VL!=0
> for completely *scalar* operations

yes - this is what SVP64Single is for.  (if you recall we had to drop
putting VL actually into the SVP64-Vector Prefix as it took up far
too many bits)

> (defined as those operations that won't ever access multiple subvectors
> because all operands are scalars/subvectors) -- these were never
> part of Cray's *vector* instructions

yes - this comes down to the fact that we're sitting on top of
the scalar regfile: all other Cray-style Vector ISAs have completely
separate (duplicate) opcodes for Scalar.  NEC SX Aurora is about
200 instructions, appx 110 of which are actually Scalar and 80
of which are Vector: RVV is more comprehensive (96 RV64GC,
192 RVV) but it shows that that is the cost: separate Vector regfiles
require separate (duplicate) Vector instructions plus transfer
instructions between scalar and vector.

> and therefore complaining about how they don't match Cray is pointless
> because Cray didn't set any precedent for how they should behave.

you still cannot put ambiguous comments into high-profile examples
that could be *interpreted* by readers looking for excuses to deliberately
misjudge our work as being that we have no idea what we are talking
about, or that we have designed something that's "complete incompetent
rubbish".

> This was committed before SVP64 Scalar prefixes existed, so at that
> point in time scalar operations that access high registers *were* poorly
> thought out, as I have pointed out multiple times in the past (SVP64
> Scalar prefix mostly fixed that -- imho we still need something for
> subvectors, having all arguments be scalar (even if subvl!=1)
> for the standard SVP64 prefix should ignore VL and only execute 1 subvector).

that's a tricky one, which would need some strong justification as to
why simply using setvl with VL=2/3/4 instead is insufficient, or if there
is sufficient usage to warrant the 2 bit budget for subvl in SVP64Single.

from this:
   https://bugs.libre-soc.org/show_bug.cgi?id=905#c1
it looks like there's 2 bits spare: the only question is, would even a
small loop fly in the lower Compliancy Levels for Embedded?

one of the advantages of SVP64Single (with no loops at all) is that
it brings predication and elwidth overrides to the entire Scalar Power
ISA as well as extending the regfile sizes, which is quite attractive
on its own merit.  BF16 and FP16 is introduced right across the board
with absolutely no need to design new opcodes, at all.

adding even any kind of looping in there? i'm ambivalent but concerned
about the cost of looping in an Embedded SVP64Single environment.

l.