[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing

lkcl luke.leighton at gmail.com
Fri Aug 13 16:14:43 BST 2021



On August 12, 2021 10:14:48 PM UTC, lkcl <luke.leighton at gmail.com> wrote:
>
>
>On August 12, 2021 9:37:16 PM UTC, Richard Wilbur
><richard.wilbur at gmail.com> wrote:

>>> i propose this change to:
>>> 
>>>     if HorizontalFirst
>>>          if srcstep < VL
>>>              srstsep increments
>>>     else if VerticalFirst
>>>          if srcstep < *MAXVL*
>>>               srcstep increments
>>> 
>>> questions, comments?
>>
>>Sounds like a good thing.
>
>my only concern is, should MVL be restricted to an immediate (for
>VFirst mode) or should it be allowed to be set via a register (RA).
>
>whilst the logic behind making MVL compile-time static for Horizontal
>Mode is obvious, i haven't got my head round Vertical Mode yet.

Horizontal-First, you perform these types of loops:

   setmaxvli 8
loop:
    setvl  r5, r3 # VL=r5=MAX(MVL, r3)
    sv.ld r20.v, r4(0) # load VL elements (max 8)
    sv.addi r20.v, r20.v, 55 # add 55 to all vector
    sv.st r20.v, r4(0) # store VL elements
    add r4, r4, r5 # move r4 pointer forward
    sub. r3, r3, r5 # decrement total count by VL
    bnz loop

this will always do 8 elements at a time until r3 drops below 8.


VerticalFirst you insert a *second inner loop* with an svstep instruction just before the bnz but also, at the moment, rather than just setmaxvli 8 is is:

    setmaxvvlandvfhint  8, 2 # MVL=8, VFHint=2

if the hardware *chooses* to set VFHint=2, there we will always have 2 elements at a time in the inner loop, until srcstep reaches VL

   setmaxvvlandvfhint  8, 2 # MVL=8, VFHint=2
loop:
    setvl  r5, r3 # VL=r5=MAX(MVL, r3)
loopinner:
    sv.ld r20.v, r4(0) # load VLhint elements (max 2)
    sv.addi r20.v, r20.v, 55 # add 55 to 2 elements
    sv.st r20.v, r4(0) # store VLhint elements
    svstep.                 # srcstep += VLhint
    bnz loopinner     # repeat until srcstep=VL
    # now done VL elements, move to next batch
    add r4, r4, r5 # move r4 pointer forward
    sub. r3, r3, r5 # decrement total count by VL
    bnz loop

the question is, then: can we get rid of the inner loop? and if we do can anything useful be done?

i have a feeling, looking at this assembler, that VLhint genuinely serves a different purpose *in addition* to VL and MAXVL.

(btw aside: svstep+bnz was why i wanted a step-and-test branch conditional instruction but it's too CISC)

l.



More information about the Libre-soc-dev mailing list