[Libre-soc-dev] memcpy optimization
colepoirier at gmail.com
Fri Dec 11 22:33:28 GMT 2020
On Fri, Dec 11, 2020 at 2:25 PM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
> here however what is the max that VL.can be... ah, up to 64.
> so there will be up to 8x 64 bit LDs in one hit.
> that means that the 8 LDs are very likely to fault.
> that in turn, because there are so many, results in an average of 4 64
> bit LDs being chucked out of the LDST Buffer (cancelled) due to a page
> fault and associated trap handling.
> that throwing page faults is SERIOUSLY suboptimal and if they are all
> misaligned the resource utilisation is absolutely dreadful.
> so i repeat again: strncpy zero detection is *not* the driver behind
> the use of ffirst. getting the parallel LDs to exclude misalignments
> (and other faulting) is the key driving factor behind why ffirst is
> used in strncpy.
> those exact same characteristics *also apply to memcpy and memset*.
> if that's really not clear can i recommend finding and reading the
> paper written by ARM's SVE team?
This one https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf?
More information about the Libre-soc-dev