[Libre-soc-dev] [RFC] merging parallel reduction into REMAP

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sun Aug 1 10:59:37 BST 2021


i'm looking at the parallel reduction algorithm and note that it is
remarkably similar to the REMAP schedule for DCT COS table generation.

   8 4 2 1

which is exactly the kind of thing i was looking for, to make general

the first issue is, however, that it is not ok to have two separate
and distinct operations.

the parallel reduxtion pseudocode has two operations:

1) the operation requested
2) a MV operation

the MV has to go.

a trick i have been using in the simulator "yield" iterators is to
create redirection lookup indices.  i am reasonably confident that
these can be blatted down to O(1) at gate level, however they give an

instead of MVing the data, use the predicate bits to sequentially
"step over" the data:

j = 0
for i, pbit in enumerate(predicate_bits):
  if pbit == 1:
    lookup[j] = i
    j += 1

then use lookup[index] in all register accessing.

i will update the pseudocode with this idea, to see what it looks like.


More information about the Libre-soc-dev mailing list