[Libre-soc-dev] memcpy optimization
richard.wilbur at gmail.com
Sun Dec 13 16:48:18 GMT 2020
> On Dec 12, 2020, at 06:41, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
> memcpy is therefore pretty much exactly the same with the predicate
> mask detection and zero detection stripped out.
> c.mv a3, a0 # Copy dst
> setvli x0, a2, vint8 # Vectors of bytes.
> vlbff.v v1, (a1) # Get src bytes
> vseq.vi v0, v1, 0 # Flag zero bytes
> vsb.v v1, (a3) # Write out bytes
> csrr t1, vl # Get number of bytes fetched
> c.bgez t1, exit # Done
> c.add a1, a1, t1 # Bump src pointer
> c.sub a2, a2, t1 # Decrement count.
> c.add a3, a3, t1 # Bump dst pointer
> c.bnez a2, loop # Anymore?
> the vmfirst and vmsif have gone, the ST has the predicate mask gone,
> and the CSR load of VL has a bgez t1 after it instead of a bgez a3.
> those are the *only* modifications.
Why is memcpy still doing the vector flag 0 bytes (vseq.vi)? Seems that would be a waste of time, here.
I get your point about not needing vmfirst, vmsif, or direct manipulation of VL.
More information about the Libre-soc-dev