[Libre-soc-dev] DCT/FFT augmentations
    Luke Kenneth Casson Leighton 
    lkcl at lkcl.net
       
    Sat Jul  3 14:22:53 BST 2021
    
    
  
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Sat, Jul 3, 2021 at 1:56 PM Hendrik Boom <hendrik at topoi.pooq.com> wrote:
>     for k in range(len(Y)):       # ydim2
>         for i in range(len(X)):              # ydim1
>            if X[i][k]
>               for j in range(len(Y[0])):        # xdim2
>                  result[i][j] += Y[k][j]
>
> we arrive at a formulation that allows collapsing the boolean
> array into 64-bit words in the j-direction.  I suspect this
> is also a speed-up, but one that doesn't mesh well with
> collapsing multiple loops into a single instruction.
correct.  applying SIMD (or multi-issue execution).
actually, multi-issue would be fine.  SIMD you could put the X[i][k]
ANDed with the predicate bits.
however, to be honest, if we're talking multi-bit patterns (64-bit)
then the probability of any given 64-bit pattern being zero is relatively
small for the saving involved.
if these were single-bit values it'd be a different matter.
hmmm, what's the difference between this and the bitmatrix operations?
https://libre-soc.org/openpower/sv/bitmanip/
l.
    
    
More information about the Libre-soc-dev
mailing list