[Libre-soc-dev] DCT/FFT augmentations
hendrik at topoi.pooq.com
Sat Jul 3 14:37:36 BST 2021
On Sat, Jul 03, 2021 at 02:22:53PM +0100, Luke Kenneth Casson Leighton wrote:
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> On Sat, Jul 3, 2021 at 1:56 PM Hendrik Boom <hendrik at topoi.pooq.com> wrote:
> > for k in range(len(Y)): # ydim2
> > for i in range(len(X)): # ydim1
> > if X[i][k]
> > for j in range(len(Y)): # xdim2
> > result[i][j] += Y[k][j]
> > we arrive at a formulation that allows collapsing the boolean
> > array into 64-bit words in the j-direction. I suspect this
> > is also a speed-up, but one that doesn't mesh well with
> > collapsing multiple loops into a single instruction.
> correct. applying SIMD (or multi-issue execution).
> actually, multi-issue would be fine. SIMD you could put the X[i][k]
> ANDed with the predicate bits.
> however, to be honest, if we're talking multi-bit patterns (64-bit)
> then the probability of any given 64-bit pattern being zero is relatively
> small for the saving involved.
I cheated above. Once the bits are compacted into 64-bit words,
I'm using bit-subscripts instead of word-subscripts. So the inner
loop gets optimised to performing bit operations on entire words
instead of bit-by-bit.
The if X[i][k] still tests individual bits.
> if these were single-bit values it'd be a different matter.
> hmmm, what's the difference between this and the bitmatrix operations?
Those are likely the instructions we'd use.
> Libre-soc-dev mailing list
> Libre-soc-dev at lists.libre-soc.org
More information about the Libre-soc-dev