[Libre-soc-dev] [RFC] Matrix and DCT/FFT SVP64 REMAP
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Mon Jul 5 00:17:38 BST 2021
On 7/4/21, Cesar Strauss <cestrauss at gmail.com> wrote:
> More precisely:
> for y in y_r:
> for x in x_r:
> for z in z_r:
> result[y][x] +=
> a[y][z] *
ah thank you.
> I don't think there can be such a thing as an "in-place" algorithm for
> matrix multiplication.
indeed there can :)
by loading the entirety of A and B into registers, and assuming A B
and result have been flattened to 1D, then using a "REMAP" schedule,
which calculates sequentially the 3 offsets, FMAC can be scheduled
with the required sequence...
... *without* having to copy the result data to and from other registers.
what then transpires is that it is the *hardware* which performs the
necessary lane-crossing, on-demand, rather than being explicitly
spelled out by SIMD instructions which cannot cope anyway.
it's extremely cool.
More information about the Libre-soc-dev