[Libre-soc-dev] [RFC] Matrix and DCT/FFT SVP64 REMAP
Jacob Lifshay
programmerjake at gmail.com
Mon Jul 5 02:24:57 BST 2021
On Sun, Jul 4, 2021, 18:10 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:
> On Mon, Jul 5, 2021 at 12:59 AM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> > that only works if the destination isn't any of the source registers,
>
> yes. and that can be arranged by reordering the loops. instead of:
> for y in y_r:
> for x in x_r:
> for z in z_r:
> result[y][x] +=
> a[y][z] *
> b[z][x]
>
> it can be done:
>
> for z in z_r:
> for y in y_r:
> for x in x_r:
> result[y][x] +=
> a[y][z] *
> b[z][x]
>
that's even worse, since now you need a whole temporary matrix instead of
just a temporary row/column:
let's look at the first z iteration: the for y for x part writes to the
whole destination matrix, and both input matrixes are needed for the next z
iteration, preventing either input matrix from using the same storage as
the output matrix.
Jacob
More information about the Libre-soc-dev
mailing list