[Libre-soc-dev] svp64 review and "FlexiVec" alternative
Jacob Bachmeyer
jcb62281 at gmail.com
Sun Aug 7 00:20:09 BST 2022
lkcl wrote:
> On Wed, Aug 3, 2022 at 10:31 PM Jacob Bachmeyer <jcb62281 at gmail.com> wrote:
>
>> Why would that same parallel tree reduction mode (invisibly selected by
>> hardware)
>>
>
> ... and made abundantly and absolutely clear in the spec that it is
> 100% without fail absolute guaranteed absolute without fail 100%
> deterministic under absolute all and any circumstances as specifically
> laid out in this executable pseudocode:
> https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/preduce.py;hb=HEAD
>
The requirement for FlexiVec is that all parallel implementations must
produce the same results as the null implementation. There is always
the option of doing a reduction in VL cycles and each step in N cycles,
simply shifting the values lane-by-lane towards the scalar unit, which
does the actual calculation.
Certain unavoidable deviations would be ruled out in the spec as
programming errors.
>> There are other possible hardware tricks, such as using
>> wider-than-normal floating point for the invisible intermediate sums to
>> avoid rounding errors,
>>
>
> the hard and inviolate rule has been set that the sub-vector
> element enumeration shall without fail be 100% Precise-Interruptible
> at any point in time and saveable/restorable.
>
FlexiVec has always met this -- this is the reason that scalar registers
are suggested to be internally used to track the progress of vector
operations.
> an invisible wider-than-normal FP register has absolutely no
> possible place to be saved and therefore has no place in any
> ISA of this type.
>
Wider-than-normal FP values would only exist in the relevant pipeline
latches during a reduction.
> other Vector ISAs make the conscious decision to have such
> intermediary hardware and usually the penalties are that (a)
> the instructions are explicit vector-sum operations and (b)
> it is prohibited to interrupt the hardware in the middle of
> such summations OR it must be necessary to roll-back
> and re-begin the entire instruction.
>
The latter would be expected; the reduction collects sums across all
vector lanes, holding a temporary until the instruction has actually
completed uninterrupted (and can then commit) would not be an issue.
> none of these things i judged to be acceptable hence the
> hard rule of sticking to element-based operations. if you
> want wider intermediate results use wider scalar elements.
>
It turns out that using wider intermediates for parallel FP reduction
may not work anyway, since the wider intermediate results could also
avoid rounding that /would/ occur in a scalar calculation...
...the other possibility is to simply declare FP "fuzzy" as it typically
has been. The issue here for FlexiVec is how strictly its host
architecture specifies FP. (I suspect Power ISA is quite exact here but
have not checked.)
-- Jacob
More information about the Libre-soc-dev
mailing list