[Libre-soc-bugs] [Bug 229] AV1 optimizations

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri Oct 14 11:44:09 BST 2022


Konstantinos Margaritis <konstantinos at vectorcamp.gr> changed:

           What    |Removed                     |Added
       The table of|markos={amount=3200}        |
  payments (in EUR)|lkcl={amount=800}           |
     for this task;|                            |
        TOML format|                            |

--- Comment #3 from Konstantinos Margaritis <konstantinos at vectorcamp.gr> ---
Using a similar method to VP9 investigation, we wrote an SVP64 implementation
of dav1d's cdef_find_dir function, which is included in src/cdef_tmpl.c.

The SVP64 function demonstrates using all the available registers to minimize
loads (unfortunately we cannot do zero-loads at the moment, but we will be when
elwidth/subvl are fully operational). The function loads and processes in
multiple ways a 8x8 array of pixels, in horizontal/vertical and diagonals
(normal and slanted) producing a "cost" array of 8 elements. The results
between C reference function and SVP64 are exactly the same:

C ref:
04858917 05cf5742 021c7323 01c68c56 05931132 03de109a 02f8e489 00f02d4b
SVP64 (register dump):
reg 24 04858917 05cf5742 021c7323 01c68c56 05931132 03de109a 02f8e489 00f02d4b

As a future improvement we could adopt elwidth=16 packed loads so that we can
minimize the number of used registers even more and we can do the whole
processing without a single memory access -apart from the initial buffer load!

This implementation demonstrates how complicated algorithms can be optimized
with SVP64 and how the abundance of registers can almost eliminate memory

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-soc-bugs mailing list