[Libre-soc-dev] parallel reduction
luke.leighton at gmail.com
Wed Sep 7 18:53:52 BST 2022
predication on REMAP schedules is much more complex
than first appears.
it is not just parallel-prefix, it is the fact that *all* REMAPs
are operation-based but that the predicate masks are only
useful as bit-lookups *after* the remapped index is calculated:
predicatebit = mask[remap(srcstep)]
even more complex is that matrix multiply needs *three*
separate and distinct predicates! one for xdim, one for ydim,
one for zdim.
my feeling is therefore that this needs closing the parallel
reduction issue as completed, and a lot of thought put into
this after mid-october.
altering the current predication system is off the table, it has
been a lot of work and there is a case for keeping it, as it controls
individual operations which is useful for remote deterministic
processing (Snitch, Eth-Zurich).
More information about the Libre-soc-dev