[Libre-soc-dev] SVP64 LD/ST element-strided

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sun May 30 12:54:58 BST 2021


On Sun, May 30, 2021 at 8:17 AM Lauri Kasanen <cand at gmx.com> wrote:

>
> The standard is for load/store offsets to be in bytes.


... and we made a promise / contract in the design of SVP64 not to deviate
from the (base, v3.0B) standard behaviour.

For unaligned
> access, the spec can just say that's forbidden.
>

it's fine: again, the behaviour should be "as if the actual v3.0B opcodes
themselves were inline in actual source code not the SVP64 for-loop".
as in, the v3.0B opcodes should, in simple implementations, literally
be inserted directly in to an *unmodified, unaltered* v3.0B scalar
execution engine.

thus if v3.0B supports unaligned, then so should SVP64-v3.0B

On Sun, May 30, 2021 at 7:44 AM Jacob Lifshay <programmerjake at gmail.com>
wrote:

> >  EA = GPR(RA) + D * i
>
> (assuming D is an immediate field) afaict the below should be good for >99%
> of uses, if a programmer really needs unaligned element ops, imho they can
> just create their own address vector and use load-gather.
>

agreed.

added an element-strided LD/ST example / support
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6e1ff2a5c6dcfa2911b64e9e5b1c90dcdd476475

that's "barely functional", Lauri - no error-checking on the syntax, and no
support
for anything other than "normal" SVP64 RM mode.  no saturation mode, etc.
in other words.

  44                         "sv.stw/els 5.v, 24(1)",  # scalar r1 + 16 +
24*offs
  45                         "sv.lwz/els 9.v, 24(1)"]) # scalar r1 + 16 +
24*offs

remember two crucial things:

1) the immediate *must* be non-zero
2) RA-as-src *must* be a scalar (1.v will be detected as an INDEXED
operation)

it *will* be possible in future - with ld-with-update - to make RA-as-dest
a vector!
so, the calculated offsets 0 64 128 192 256... if you were to need them,
would
be stored in a Vector of RA destinations.

the only thing with that is: the RA-as-src and RA-as-dest use the *same* RA
5-bit.  there is *only* room to extend RA-as-src and RA-as-dest by 2 extra
bits
each.

thus, you have a limited range.  if in the prefix 5-bit field RA is set to
say
r3, then the RA-as-dest can be... i can't remember the exact calculation...
(r3, r35, r67 and r99) something like that.

l.


More information about the Libre-soc-dev mailing list