[Libre-soc-bugs] [Bug 229] AV1 optimizations
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Fri Oct 14 11:44:09 BST 2022
https://bugs.libre-soc.org/show_bug.cgi?id=229
Konstantinos Margaritis <konstantinos at vectorcamp.gr> changed:
What |Removed |Added
----------------------------------------------------------------------------
The table of|markos={amount=3200} |
payments (in EUR)|lkcl={amount=800} |
for this task;| |
TOML format| |
--- Comment #3 from Konstantinos Margaritis <konstantinos at vectorcamp.gr> ---
Using a similar method to VP9 investigation, we wrote an SVP64 implementation
of dav1d's cdef_find_dir function, which is included in src/cdef_tmpl.c.
The SVP64 function demonstrates using all the available registers to minimize
loads (unfortunately we cannot do zero-loads at the moment, but we will be when
elwidth/subvl are fully operational). The function loads and processes in
multiple ways a 8x8 array of pixels, in horizontal/vertical and diagonals
(normal and slanted) producing a "cost" array of 8 elements. The results
between C reference function and SVP64 are exactly the same:
C ref:
04858917 05cf5742 021c7323 01c68c56 05931132 03de109a 02f8e489 00f02d4b
SVP64 (register dump):
reg 24 04858917 05cf5742 021c7323 01c68c56 05931132 03de109a 02f8e489 00f02d4b
As a future improvement we could adopt elwidth=16 packed loads so that we can
minimize the number of used registers even more and we can do the whole
processing without a single memory access -apart from the initial buffer load!
This implementation demonstrates how complicated algorithms can be optimized
with SVP64 and how the abundance of registers can almost eliminate memory
access.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list