[Libre-soc-dev] memcpy optimization
Jacob Lifshay
programmerjake at gmail.com
Fri Dec 11 19:44:04 GMT 2020
On Fri, Dec 11, 2020, 11:20 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:
> the general dynamic case [of memcpy], when either the count or alignment
> is *not*
> known, is however flat-out impossible to use 64 bit granularity:
> that's the seductive SIMD way.
agreed.
the only way to make dynamic general
> memcpy efficient is to use fail-on-first.
>
no, fail on first is used when you are using a data-dependent loop count,
memcpy is data-independent (copies the same number of bytes no matter what
byte values it sees).
if there's a page-fault (even if not using vector instructions at all)
either that's a sigsegv or invisible to user code, so memcpy doesn't use
fail-on-first.
code (ignoring memcpy's return value):
memcpy: # r3=dest, r4=src, r5=count
setvl r6, r5, maxvl=64
ld <vec>r64, (<scalar>r4), elwidth=1
st <vec>r64, (<scalar>r3), elwidth=1
sub. r5, r5, r6
add r3, r3, r6
add r4, r4, r6
bne memcpy
blr
Jacob
More information about the Libre-soc-dev
mailing list