[Libre-soc-dev] [RFC] Matrix and DCT/FFT SVP64 REMAP

Luke Kenneth Casson Leighton lkcl at lkcl.net
Mon Jul 5 00:17:38 BST 2021


On 7/4/21, Cesar Strauss <cestrauss at gmail.com> wrote:

> More precisely:
>
> for y in y_r:
>  for x in x_r:
>    for z in z_r:
>      result[y][x] +=
>         a[y][z] *
>         b[z][x]

ah thank you.

> I don't think there can be such a thing as an "in-place" algorithm for
> matrix multiplication.

indeed there can :)

by loading the entirety of A and B into registers, and assuming A B
and result have been flattened to 1D, then using a "REMAP" schedule,
which calculates sequentially the 3 offsets, FMAC can be scheduled
with the required sequence...

... *without* having to copy the result data to and from other registers.

what then transpires is that it is the *hardware* which performs the
necessary lane-crossing, on-demand, rather than being explicitly
spelled out by SIMD instructions which cannot cope anyway.

it's extremely cool.

l.



More information about the Libre-soc-dev mailing list