[Libre-soc-bugs] [Bug 230] Video opcode development and discussion
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Fri Dec 11 17:10:38 GMT 2020
https://bugs.libre-soc.org/show_bug.cgi?id=230
--- Comment #15 from Jacob Lifshay <programmerjake at gmail.com> ---
If your trying to do a giant sum-reduction and don't care that much about the
exact order of add ops, the code I've seen that should be most efficient is:
const size_t HW_PAR = 32; // the number of add ops per inner loop that the hw
needs to keep the pipeline full
using vec = sv_vec<float, HW_PAR>;
float reduce_add(float *in, size_t in_size) {
// optionally do a single shorter horizontal_add for in_size < HW_PAR
vec accumulator = splat(0.0f, VL=HW_PAR);
while(in_size != 0) {
size_t vl = min(HW_PAR, in_size);
vec v = load(in, VL=vl);
// elements with index >= vl are unmodified
accumulator = add(accumulator, v, VL=vl);
in += vl;
in_size -= vl;
}
return horizontal_add(accumulator, VL=HW_PAR);
}
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list