[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing

lkcl luke.leighton at gmail.com
Fri Aug 13 21:48:23 BST 2021

On August 13, 2021 3:14:43 PM UTC, lkcl <luke.leighton at gmail.com> wrote:

>Horizontal-First, you perform these types of loops:
>   setmaxvli 8
>    setvl  r5, r3 # VL=r5=MAX(MVL, r3)
>    sv.ld r20.v, r4(0) # load VL elements (max 8)
>    sv.addi r20.v, r20.v, 55 # add 55 to all vector
>    sv.st r20.v, r4(0) # store VL elements
>    add r4, r4, r5 # move r4 pointer forward
>    sub. r3, r3, r5 # decrement total count by VL
>    bnz loop

oo, oo, i just had an idea.

     setvlc r5  # VL=r5=MAX(MVL, CTR)
     add r4, r4, r5
     sv.bnz/VLCTR   # subtracts VL from CTR

SVSTATE is *already* going into sv.bc so it is not a hardship to subtract VL from CTR.

this reduces critical inner loops by one instruction and frees up a GPR.  using CTR for loops is normal in Power ISA anyway.

doesn't help with VFHint though.


