[Libre-soc-dev] results of phone call about simple-v prefix
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sat Nov 28 08:18:52 GMT 2020
On 11/28/20, Lauri Kasanen <cand at gmx.com> wrote:
> Commonly the vector length won't change often. There are cases where it
> would change inside a loop, say gathering data about a block to a
> smaller block, then processing that smaller block. It's probably fine
> to have an instruction change it.
thank you lauri.
i will be sad in a Top Gear Jeremy Clarkson way to see SV-P64 unable
to do single instruction self-contained predicated full arbitrary
vectorisation of all and any scalar instructions.
my favourite is full twin-predicated vectorised LD and ST. SV-P64
LD/ST is the "dream" bucketlist instruction that every ISA wishes they
had then regrets efforts to jam it into a scalar ISA. ARM, POWER,
MIPS, x86, they've all tried it on and had to backpedal.
SV-P64 LD/ST is *accident and incidental* provision of a
context-switch and function call "dream instruction" that can place
context-sensitive parts of a full and entire register file onto and
off of the stack in *one* instruction.
the predicate register acts as the mask that indicates which registers
are relevant to the function call and need pushing or popping on the
stack. likewise for context-switching.
and it wasn't even planned to be deliberately added, it just happened
to fall out of the prefixing!
not having the ability to set VL and MAXVL in SV-P64 (yes both need to
be set) requires 12 bits of immediate data, plus a src register
number.
that is quite a significant number of bits. it doesn't fit X-Form or
XFX-Form because these are 5-5-5-10 and 5-10-10
hypothetically it would be possible to "invade" the 10 bit XO minor
opcode field of X/XFX with 4x opcodes per SETVL/MVL instruction in
order get the extra 2 bits needed for setting VL and MAXVL
simultaneously.
the alternative is to use something like EXT017 where sc and scv live,
and invent a new Form.
having 2 separate instructions, one for VL and one for MVL, or making
it mandatory to use v3.1B 64bit for this begins to penalise the
entirety of SV when small inner loops are coded up.
i spent *significant* amounts of time doing side-by-side comparisons
of common assembly loops and patterns of RVV to get instruction count
down.
having 2x32 bit v3.0B or 1x 64 bit v3.1B whose sole exclusive purpose
is to set VL and MAXVL in an inner loop that only comprises 6 to 8
instructions in the first place constitutes a double digit downgrade
in encoding efficiency compared to SV-P64.
and in the case of that accidental SV-P64 LD/ST it's a triple digit
efficiency downgrade.
l.
More information about the Libre-soc-dev
mailing list