[Libre-soc-dev] [RFC] Matrix and DCT/FFT SVP64 REMAP

Tue Jul 6 03:41:58 BST 2021

On Jul 5, 2021, at 18:02, Jacob Lifshay <programmerjake at gmail.com> wrote:
> Having the operation always be mul-add allows us to calculate Y = A *
> B + C for matrices/vectors A, B, C, which is occasionally handy when
> the result is a matrix, and handy much more often when the result is a
> vector or a scalar. I think we should definitely have a version where
> we always mul-add and don't just-mul-at-the-first-step, since it is
> more consistent with SV semantics, and since adding to the result is
> relatively common in GPU math for vector/scalar results.

I completely agree that supporting
> Y = A * B + C
sounds like a useful thing, especially when we can show important algorithms which use that type of operation.   (Have we started a list on the wiki?)  On the other hand my understanding from earlier posts in this thread led me to believe we were discussing
Y = A * B
which, based on arity alone, is a different operation.

My proposal embodies an appeal to the “creation is initialization” philosophy.

It seems you are proposing to treat
Y = A * B
as a special case of
Y = A * B + C
where C = 0.  Is that the case?  If so, we should look at the relative efficiency of using additional instructions to initialize a register (or vector or matrix of registers) to 0 and whether there are important algorithms which use this special case.

Y = A * B + C
Should be pretty easy to implement by initializing the target register(s) with a copy of C and then using multiply accumulate to evaluate A * B.

I suppose this is where having the semantics to code a dedicated “0” source in the register specification could be useful to allow the dispatcher to tell the vector unit to send 0’s for a particular operand.  (Avoiding the explicit initialization of C.)