[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing
luke.leighton at gmail.com
Fri Aug 13 21:48:23 BST 2021
On August 13, 2021 3:14:43 PM UTC, lkcl <luke.leighton at gmail.com> wrote:
>Horizontal-First, you perform these types of loops:
> setmaxvli 8
> setvl r5, r3 # VL=r5=MAX(MVL, r3)
> sv.ld r20.v, r4(0) # load VL elements (max 8)
> sv.addi r20.v, r20.v, 55 # add 55 to all vector
> sv.st r20.v, r4(0) # store VL elements
> add r4, r4, r5 # move r4 pointer forward
> sub. r3, r3, r5 # decrement total count by VL
> bnz loop
oo, oo, i just had an idea.
setvlc r5 # VL=r5=MAX(MVL, CTR)
add r4, r4, r5
sv.bnz/VLCTR # subtracts VL from CTR
SVSTATE is *already* going into sv.bc so it is not a hardship to subtract VL from CTR.
this reduces critical inner loops by one instruction and frees up a GPR. using CTR for loops is normal in Power ISA anyway.
doesn't help with VFHint though.
More information about the Libre-soc-dev