[Libre-soc-dev] memcpy optimization

Fri Dec 11 19:44:04 GMT 2020

On Fri, Dec 11, 2020, 11:20 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> the general dynamic case [of memcpy], when either the count or alignment
> is *not*
> known, is however flat-out impossible to use 64 bit granularity:
> that's the seductive SIMD way.

agreed.

the only way to make dynamic general
> memcpy efficient is to use fail-on-first.
>

no, fail on first is used when you are using a data-dependent loop count,
memcpy is data-independent (copies the same number of bytes no matter what
byte values it sees).

if there's a page-fault (even if not using vector instructions at all)
either that's a sigsegv or invisible to user code, so memcpy doesn't use
fail-on-first.

code (ignoring memcpy's return value):
memcpy: # r3=dest, r4=src, r5=count
    setvl r6, r5, maxvl=64
    ld <vec>r64, (<scalar>r4), elwidth=1
    st <vec>r64, (<scalar>r3), elwidth=1
    sub. r5, r5, r6
    add r3, r3, r6
    add r4, r4, r6
    bne memcpy
    blr

Jacob