[Libre-soc-bugs] [Bug 230] Video opcode development and discussion

Sat Dec 12 07:49:55 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=230

--- Comment #24 from cand at gmx.com ---
Yeah, the full arbitrary-width is not viable. And I know you find fixed-width
ops ugly, but in this case it may be necessary.

> ok so there *is* one possibility, there: a special vec4 only operation that takes 4x elements and multiplies them all together, targetting a non-vec4 dest.
>
> however these would be very specialist operations, that i would like to defer until "Phase 2" i.e. after the initial implementation of SV looping.

I realized a horizontal 4-element add would also be useful for the generic
pixel pack case, since | and + are the same op when no bits overlap. It would
replace the last three ORs, speeding it up by 1-2 clocks per pixel, plus the
avoided stall (I'm assuming the horizontal 4-op can avoid the normal
non-4-offset reg stall).

This then lead to the opposite operation too, a 1-to-4 bit scatter with shift
and AND. For pixels this is mainly in video encoding, but being a generic bit
op it should find use in decompression too.

If these are too special for phase 1, that's fine, since it doesn't require any
changes at the SV loop level. However the speed implications are great, and I
do think we need some accel for this case.

-- 
You are receiving this mail because:
You are on the CC list for the bug.