[Libre-soc-dev] SVP64 parallel map-reduce idea

Fri Jun 11 19:05:59 BST 2021

after implementing the scalar map-reduce in ISACaller, it occurred to me
that a fixed (predictable, useful, non-ambiguous, clear) algorithm for the
Program Order in which the reductions has to take place might be a good
idea.

assuming that the "base" instruction is: add r1.v r10.v, r14.v and that
VL=4, the Program Order i would consider be:

* add r1 r10 r14
* add r2 r11 r15
* add r3 r12 r16
* add r4 r13 r17
* add r3 r3 r4
* add r1 r1 r2
* add r1, r1, r3

an in-place parallel map-reduce add would be: add r1.v, r1.v, r14.v which
would produce:

* add r1 r1 r14
* add r2 r2 r15
* add r3 r3 r16
* add r4 r4 r17
* add r3 r3 r4
* add r1 r1 r2
* add r1, r1, r3

in other words, the *initial* run is just "a straight normal vector
operation", with a follow-up of a sequence of (VL-1) scalar map-reductions
on the result Vector (which could also be tree-reduced in optimised
hardware)

question is: are there any *commutative* map-reduce "base" operations for
which this reduction pattern is inappropriate?

are there any other algorithms that should be considered?

should it be left up to the implementor?

l.

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68