[Libre-soc-dev] WASM flexible-vectors & SimpleV
programmerjake at gmail.com
Fri Apr 9 10:29:40 BST 2021
On Fri, Apr 9, 2021, 02:05 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> On Fri, Apr 9, 2021 at 7:13 AM Jacob Lifshay <programmerjake at gmail.com>
> > I mentioned SimpleV on the issue tracker for WASM flexible-vectors -- a
> > WASM extension for supporting wide (>128-bit SIMD) vectors.
> ah cool
> it may be worthwhile mentioning that for predicated gather-scatter,
> SV's explicit variable-vector-length may be used to guarantee that a
> scatter selects a fixed number of (predicated) items in a single
> instruction, rather than requiring a loop. illustration:
> * mask = 0b100001011 # 4 bits set
> * top_bit = 64-CNTTZ(mask) # index of the highest bit of mask
> * Vector_Length = top_bit # set VL to cover right up to highest bit
> * gather(vector) # guaranteed to do all 4 gathers in one operation
> this cannot be done with RISC-V because the RISC-V specification
> states that the hardware may set VL to an arbitrary value. in RISC-V
> RVV, the program *requests* a certain VL, but the hardware is
> permitted to allocate *less* than the maximum *available* VL
umm, that's actually not quite right...in RVV there is a runtime constant
value MVL where if you request a VL <= MVL you will *always* get exactly
what you requested (just like SimpleV's setvl though on SimpleV the
programmer instead of the cpu designer gets to select MVL since you can
easily set it at runtime by just putting a different value in the setvl
On RVV you might get less than MVL *only* if your requested VL is more than
MVL (that's for balancing purposes, to ensure you don't have 1 loop with
only a few elements and/or for other microarchitectural optimizations).
(consequently to guarantee completion, all VL-based operations must be
> in a loop construct)
If you know your requested VL is <= MVL, no loop is necessary in either
SimpleV or RVV. A loop is only necessary if you don't know that for sure.
SV *requires* that VL is set to the exact amount
> requested, and thus in some cases one inner loop may be removed.
> also that twin-predication (source mask separate from dest mask) is
> also possible, which gives a form of back-to-back VREDUCE-VEXPAND.
> this pattern applies even to GATHER LD/STs
I'll link to this message
More information about the Libre-soc-dev