[Libre-soc-dev] dsld/dsrd

Tue Oct 25 10:40:42 BST 2022

On October 25, 2022 2:39:18 AM GMT+01:00, Jacob Lifshay
<programmerjake at gmail.com> wrote:

>I would regard elwidth overrides as much less important for dsld/dsrd,
>they
>are mostly only useful for full 64-bit, sld/srd can be easily used when
>smaller elwidths are needed.

initially i thought so too, but then realised that sld/srd are
also single-reg value shift, where dsrd/ew=32 is still twin-regs
and can still be used to do bigint-shifting.

which doesn't "seem" to matter until you look at chacha20 hmac
which uses 17 byte arithmetic.  everyone else rounds that up
to the nearest 32, 64 (or even 128-bit) integer arithmetic.

>> or svoffset) which begs the question: is it worthwhile to have
>> some form of special (non-orthogonal) behaviour involving
>> RC and the 9th bit of EXTRA which is free in EXTRA2 4-operand
>> form?
>
>how about just defining the 9th bit to instead make RB (or RA) be
>EXTRA3
>form for all 4-operand instructions? that is also useful for maddedu
>where
>RB (or RA) is the scalar multiplier and you need to specify successive
>registers in the vector/bigint you're multiplying by:

liked the principle: the example shows it can be done automatically,
no need to add extra assembler flags.  the only problem being
[for maddedu] it destroys (fights) with the other use for the 9th bit:
selection of RS=RT+1 [RS=RT+MAXVL, scalar has MAXVL=1]

>> or, to attempt 3-operand EXTRA3 with 4 operands, treating the
>> shift source as mandatory scalar, for example?
>
>no, vector shift source is specifically needed for prefix-code
>encoding.

yeah i worked that out afterwards (doh), retrospectively.

my thoughts are here for prefix-code, just use sm=2 variant and
go into the loop with a copy of the shift-amount.  then follow up
with a separate OR-reduction instruction.

(we are 100% categorically not going to be adding 4-in 1-out 64-bit
reg instructions, we had that conversation already.  3-in 2-out is
pushing the limit as it is, and is only justifiable because of RTp,
RTa, the very existence of VSX, and the LD-ST-update instructions).

l.