[Libre-soc-dev] SVP64 parallel map-reduce idea
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Fri Jun 11 19:05:59 BST 2021
after implementing the scalar map-reduce in ISACaller, it occurred to me
that a fixed (predictable, useful, non-ambiguous, clear) algorithm for the
Program Order in which the reductions has to take place might be a good
idea.
assuming that the "base" instruction is: add r1.v r10.v, r14.v and that
VL=4, the Program Order i would consider be:
* add r1 r10 r14
* add r2 r11 r15
* add r3 r12 r16
* add r4 r13 r17
* add r3 r3 r4
* add r1 r1 r2
* add r1, r1, r3
an in-place parallel map-reduce add would be: add r1.v, r1.v, r14.v which
would produce:
* add r1 r1 r14
* add r2 r2 r15
* add r3 r3 r16
* add r4 r4 r17
* add r3 r3 r4
* add r1 r1 r2
* add r1, r1, r3
in other words, the *initial* run is just "a straight normal vector
operation", with a follow-up of a sequence of (VL-1) scalar map-reductions
on the result Vector (which could also be tree-reduced in optimised
hardware)
question is: are there any *commutative* map-reduce "base" operations for
which this reduction pattern is inappropriate?
are there any other algorithms that should be considered?
should it be left up to the implementor?
l.
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the Libre-soc-dev
mailing list