[Libre-soc-dev] svp64 review and "FlexiVec" alternative

lkcl luke.leighton at gmail.com
Sun Aug 7 10:48:03 BST 2022

On Sun, Aug 7, 2022 at 12:20 AM Jacob Bachmeyer <jcb62281 at gmail.com> wrote:

> The latter would be expected; the reduction collects sums across all
> vector lanes, holding a temporary until the instruction has actually
> completed uninterrupted (and can then commit) would not be an issue.

and that, unfortunately, is where the hardware micro-architectural
choice bleeds across into the ISA.

this is why we created the parallel-prefix algorithm/specification
because it *begins* in-lane and only towards the very end ends
up inter-lane... but is otherwise fully deterministic.

> ...the other possibility is to simply declare FP "fuzzy" as it typically
> has been.  The issue here for FlexiVec is how strictly its host
> architecture specifies FP.  (I suspect Power ISA is quite exact here but
> have not checked.)

IEEE754 exact, except for explicit instructions computing estimates.

we have to relax that for 3D to meet Vulkan Spec requirements
which must take precedence (4x the silicon to get exact ULP=0
spells commercial suicide for a 3D GPU)


More information about the Libre-soc-dev mailing list