[Libre-soc-dev] [RFC] Matrix and DCT/FFT SVP64 REMAP

Tue Jul 6 01:01:44 BST 2021

On Mon, Jul 5, 2021 at 4:39 PM Richard Wilbur <richard.wilbur at gmail.com> wrote:
> When the inner dimension (M in the formulation above) is greater than 1, we need multiply accumulate for that loop.  We can avoid initializing the target registers to zero by simply always starting with a simple multiply for the first iteration of the inner loop.  Then if the inner dimension (size) is greater than 1, for all subsequent iterations of the inner loop do a multiply accumulate.

Having the operation always be mul-add allows us to calculate Y = A *
B + C for matrices/vectors A, B, C, which is occasionally handy when
the result is a matrix, and handy much more often when the result is a
vector or a scalar. I think we should definitely have a version where
we always mul-add and don't just-mul-at-the-first-step, since it is
more consistent with SV semantics, and since adding to the result is
relatively common in GPU math for vector/scalar results.

Jacob