[Libre-soc-bugs] [Bug 229] AV1 optimizations

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri Oct 14 11:44:09 BST 2022


https://bugs.libre-soc.org/show_bug.cgi?id=229

Konstantinos Margaritis <konstantinos at vectorcamp.gr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
       The table of|markos={amount=3200}        |
  payments (in EUR)|lkcl={amount=800}           |
     for this task;|                            |
        TOML format|                            |

--- Comment #3 from Konstantinos Margaritis <konstantinos at vectorcamp.gr> ---
Using a similar method to VP9 investigation, we wrote an SVP64 implementation
of dav1d's cdef_find_dir function, which is included in src/cdef_tmpl.c.

The SVP64 function demonstrates using all the available registers to minimize
loads (unfortunately we cannot do zero-loads at the moment, but we will be when
elwidth/subvl are fully operational). The function loads and processes in
multiple ways a 8x8 array of pixels, in horizontal/vertical and diagonals
(normal and slanted) producing a "cost" array of 8 elements. The results
between C reference function and SVP64 are exactly the same:

C ref:
04858917 05cf5742 021c7323 01c68c56 05931132 03de109a 02f8e489 00f02d4b
SVP64 (register dump):
reg 24 04858917 05cf5742 021c7323 01c68c56 05931132 03de109a 02f8e489 00f02d4b

As a future improvement we could adopt elwidth=16 packed loads so that we can
minimize the number of used registers even more and we can do the whole
processing without a single memory access -apart from the initial buffer load!

This implementation demonstrates how complicated algorithms can be optimized
with SVP64 and how the abundance of registers can almost eliminate memory
access.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list