[Libre-soc-dev] DCT/FFT augmentations
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sat Jul 3 14:22:53 BST 2021
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Sat, Jul 3, 2021 at 1:56 PM Hendrik Boom <hendrik at topoi.pooq.com> wrote:
> for k in range(len(Y)): # ydim2
> for i in range(len(X)): # ydim1
> if X[i][k]
> for j in range(len(Y)): # xdim2
> result[i][j] += Y[k][j]
> we arrive at a formulation that allows collapsing the boolean
> array into 64-bit words in the j-direction. I suspect this
> is also a speed-up, but one that doesn't mesh well with
> collapsing multiple loops into a single instruction.
correct. applying SIMD (or multi-issue execution).
actually, multi-issue would be fine. SIMD you could put the X[i][k]
ANDed with the predicate bits.
however, to be honest, if we're talking multi-bit patterns (64-bit)
then the probability of any given 64-bit pattern being zero is relatively
small for the saving involved.
if these were single-bit values it'd be a different matter.
hmmm, what's the difference between this and the bitmatrix operations?
More information about the Libre-soc-dev