[Libre-soc-dev] svp64 review and "FlexiVec" alternative
Jacob Bachmeyer
jcb62281 at gmail.com
Wed Aug 3 22:31:31 BST 2022
Jacob Lifshay wrote:
> On Tue, Aug 2, 2022 at 9:53 PM Jacob Bachmeyer via Libre-soc-dev
> <libre-soc-dev at lists.libre-soc.org> wrote:
>
>> lkcl wrote:
>>
>>> i have a feeling that Mitch worked out how to do it. FMAC
>>> having in effect a Scalar accumulator (src==dest) whilst
>>> other operands get tagged as vectors, HW can detect that and
>>> go "ah HA! what you *actually* want here is a horizontal
>>> sum, let me just microcode that for you".
>>>
>>>
>> Well, now that I think about it, yes, FlexiVec *can* express a
>> horizontal sum by accumulating into a scalar register. Hardware
>> recognizes this very simply: an ADD targeting a scalar register RX,
>> using that same RX and a vector register RY. This will also work with
>> the null implementation.
>>
>
> Do note that this trick only works well for integer add, floating
> point add is not associative so must be run serially (assuming the
> semantics are equivalent to running the code serially from element 0
> to the end). SVP64 specifically has an O(log N) parallel tree
> reduction mode to work around that.
Why would that same parallel tree reduction mode (invisibly selected by
hardware) not be suitable for each VL-element group, followed by serial
accumulation of group sums into a scalar register?
There are other possible hardware tricks, such as using
wider-than-normal floating point for the invisible intermediate sums to
avoid rounding errors, or simply running a FP accumulate serially,
shifting the values across the vector lanes (access to the adjacent lane
is feasible in an SIMT vector unit) and accumulating them in the scalar
unit.
-- Jacob
More information about the Libre-soc-dev
mailing list