[Libre-soc-dev] [RFC] merging parallel reduction into REMAP
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sun Aug 1 10:59:37 BST 2021
i'm looking at the parallel reduction algorithm and note that it is
remarkably similar to the REMAP schedule for DCT COS table generation.
8 4 2 1
which is exactly the kind of thing i was looking for, to make general
the first issue is, however, that it is not ok to have two separate
and distinct operations.
the parallel reduxtion pseudocode has two operations:
1) the operation requested
2) a MV operation
the MV has to go.
a trick i have been using in the simulator "yield" iterators is to
create redirection lookup indices. i am reasonably confident that
these can be blatted down to O(1) at gate level, however they give an
instead of MVing the data, use the predicate bits to sequentially
"step over" the data:
j = 0
for i, pbit in enumerate(predicate_bits):
if pbit == 1:
lookup[j] = i
j += 1
then use lookup[index] in all register accessing.
i will update the pseudocode with this idea, to see what it looks like.
More information about the Libre-soc-dev