[Libre-soc-dev] [RFC] Matrix and DCT/FFT SVP64 REMAP
Jacob Lifshay
programmerjake at gmail.com
Mon Jul 5 00:58:54 BST 2021
On Sun, Jul 4, 2021, 16:18 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:
> On 7/4/21, Cesar Strauss <cestrauss at gmail.com> wrote:
>
> > More precisely:
> >
> > for y in y_r:
> > for x in x_r:
> > for z in z_r:
> > result[y][x] +=
> > a[y][z] *
> > b[z][x]
>
> ah thank you.
>
> > I don't think there can be such a thing as an "in-place" algorithm for
> > matrix multiplication.
>
> indeed there can :)
> by loading the entirety of A and B into registers, and assuming A B
> and result have been flattened to 1D, then using a "REMAP" schedule,
> which calculates sequentially the 3 offsets, FMAC can be scheduled
> with the required sequence...
>
that only works if the destination isn't any of the source registers,
because otherwise it would need to store at least a row of temporary data
somewhere otherwise it would overwrite each input/output row before it
stops needing the original inputs in that row anymore and it can be
overwritten. That's assuming SV follows the semantics where each vector
element op appears to completely finish before the next vector element
starts.
Jacob
More information about the Libre-soc-dev
mailing list