[Libre-soc-dev] [RFC] Matrix and DCT/FFT SVP64 REMAP

Jacob Lifshay programmerjake at gmail.com
Mon Jul 5 02:24:57 BST 2021


On Sun, Jul 4, 2021, 18:10 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Mon, Jul 5, 2021 at 12:59 AM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> > that only works if the destination isn't any of the source registers,
>
> yes.  and that can be arranged by reordering the loops.  instead of:
>  for y in y_r:
>   for x in x_r:
>     for z in z_r:
>       result[y][x] +=
>          a[y][z] *
>          b[z][x]
>
> it can be done:
>
>   for z in z_r:
>    for y in y_r:
>     for x in x_r:
>       result[y][x] +=
>          a[y][z] *
>          b[z][x]
>

that's even worse, since now you need a whole temporary matrix instead of
just a temporary row/column:
let's look at the first z iteration: the for y for x part writes to the
whole destination matrix, and both input matrixes are needed for the next z
iteration, preventing either input matrix from using the same storage as
the output matrix.

Jacob


More information about the Libre-soc-dev mailing list