[Libre-soc-dev] DCT/FFT augmentations

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sat Jul 3 14:22:53 BST 2021

crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Sat, Jul 3, 2021 at 1:56 PM Hendrik Boom <hendrik at topoi.pooq.com> wrote:

>     for k in range(len(Y)):       # ydim2
>         for i in range(len(X)):              # ydim1
>            if X[i][k]
>               for j in range(len(Y[0])):        # xdim2
>                  result[i][j] += Y[k][j]
> we arrive at a formulation that allows collapsing the boolean
> array into 64-bit words in the j-direction.  I suspect this
> is also a speed-up, but one that doesn't mesh well with
> collapsing multiple loops into a single instruction.

correct.  applying SIMD (or multi-issue execution).

actually, multi-issue would be fine.  SIMD you could put the X[i][k]
ANDed with the predicate bits.

however, to be honest, if we're talking multi-bit patterns (64-bit)
then the probability of any given 64-bit pattern being zero is relatively
small for the saving involved.

if these were single-bit values it'd be a different matter.

hmmm, what's the difference between this and the bitmatrix operations?


More information about the Libre-soc-dev mailing list