[Libre-soc-bugs] [Bug 230] Video opcode development and discussion

Fri Dec 11 07:53:46 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=230

--- Comment #11 from cand at gmx.com ---
> my concern is: how many? if that's 4 instructions in an inner loop then that 
> could well be 25% of the pixel/clock ratio that could be achieved if there was 
> a dedicated op that did the same.

Talking from ffmpeg VSX experience, if everything else is accelerated but not
the final conversion, it does take 30-40%. But that's scalar. When SIMDed, it
gets almost perfect scaling, say 7.8x when perfect is 8x. So in parallel, it's
then reduced to ~5% with just 8 units.

The point: hardcoding the conversion may bring a constant speedup of say 4x.
Parallelization brings a speedup of N, how many ops the cpu can do in parallel.
So for most formats, just have many units. (for specific heavy and very often
used things, like yuv <-> rgb, we do want both)

> Did you go through just the specifically VSX instructions or also the vector 
> instructions? IIRC there are some important instructions that aren't listed 
> under VSX -- mostly ones that use the full 128-bit register as a single value -
> - like AES encrypt step.

I went through the full 3.0/power9 vector instr set (altivec + vsx + exts), but
ignored crypto since it's not relevant to AV. AES/SHA/etc accel belongs to
other parts. The power8 and power9 additions are mostly just new types to
existing functions.

-- 
You are receiving this mail because:
You are on the CC list for the bug.