[Libre-soc-dev] harmful SIMD again
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Fri Aug 20 12:23:29 BST 2021
https://www.reddit.com/r/programming/comments/p0yn45/three_fundamental_flaws_of_simd/h9ncoft/?utm_source=reddit&utm_medium=web2x&context=3
FUZxxl is advocating the case for SIMD by implementing this algorithm
in AVX512, SSE and NEON:
https://github.com/clausecker/pospop/blob/master/safe.go
i initially made the mistake of thinking it was a straight vectorised popcount,
it's not. for count8safe there are 8 accumulators:
* accumulator 0 receives the count of the number of bit zeros of ALL
the input vector.
* accumulator 1 receives the count of the number of bit 1s of the entire
input vector
for i := range buf {
for j := 0; j < 8; j++ {
counts[j] += int(buf[i] >> j & 1)
}
}
NOT
for i := range buf {
for j := 0; j < 8; j++ {
# counts is NOT the same length as buf.
counts[>>>>I<<<<] += int(buf[i] >> j & 1)
}
}
turns out that SVP64 can do this algorithm in about 12 or so instructions,
which is 35x less than the instruction count needed for AVX512 or NEON.
l.
More information about the Libre-soc-dev
mailing list