[Libre-soc-bugs] [Bug 230] Video opcode development and discussion
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Fri Dec 11 21:13:00 GMT 2020
https://bugs.libre-soc.org/show_bug.cgi?id=230
--- Comment #16 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #15)
> If your trying to do a giant sum-reduction and don't care that much about
> the exact order of add ops, the code I've seen that should be most efficient
> is:
an algorithm that performs a fixed set of parallel laned ADDs. duh. so
simple.
(In reply to cand from comment #14)
> Consider how long it took for you to think it through and how long the text
> describing it is.
don't tell no-one, i'm not actually very good at coming up with algorithms
on-the-fly. usually it takes several days to weeks of thought :)
> Now consider it is a one-line operation in SIMD ISAs.
SIMD ISAs have the "advantage" of being able to hard-code the mapreduce tree
(or parallel accumulator algorithm) because the SIMD size is fixed and known.
> This
> means nobody will bother coding it in SV, handicapping it.
due to the underlying hardware complexity i am still inclined to defer it
unless it can be shown to be a significant performance hindrance by not having
it.
keep it in mind: if you find an algorithm that's severely compromised,
performance-wise, we can look at it.
that's the whole development ethos here: focus first on optimising the low
hanging fruit.
> Now, I'm not saying we have to have it fully in hw. A mid-step that just
> does it for 4 or 8 elements, in one clock, would be close enough.
ok so there *is* one possibility, there: a special vec4 only operation that
takes 4x elements and multiplies them all together, targetting a non-vec4 dest.
however these would be very specialist operations, that i would like to defer
until "Phase 2" i.e. after the initial implementation of SV looping.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list