[Libre-soc-bugs] [Bug 1155] O(n^2) multiplication REMAP mode(s)

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Thu Dec 28 03:48:43 GMT 2023


https://bugs.libre-soc.org/show_bug.cgi?id=1155

--- Comment #35 from Jacob Lifshay <programmerjake at gmail.com> ---
some additional thoughts on the multiply algorithm from comment #10:

we'll want to have an additional instruction afterward that clears the
temporary values, since this allows the hardware to not have to compute them,
which may be much more complex than just computing the product directly,
especially if the hardware uses a different multiplication algorithm
internally.

(In reply to Jacob Lifshay from comment #10)
> loop:
>     sv.maddedu *y, *a, *b, *y  # RS is stored in scalar t
>     sv.adde *(y + 1), *(y + 1), t
>     svstep.
>     sv.bdnz ... loop

basically, we want to clear `t` and `CA32` afterward so the hardware can
pattern-match the whole thing and use whatever multiplication algorithm it
likes, without having to compute the correct `t`/`CA32` values, which would
basically require an extra multiplier just to get the right values. this gives
hardware designers the freedom to use whatever multiplication hardware is the
most efficient or fastest or etc. without being forced to produce values for
`t` and `CA32` that're just ignored anyway.

we don't need to clear CA since it is known to be zero (but the hardware needs
to be able to recognize that CA is zero on entry, or it could just speculate
that CA is zero and if it isn't the loop can be restarted and run in scalar
mode)

We can just use subfc t, t, t to clear CA, CA32 and t in one simple
instruction.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list