[Libre-soc-dev] dsld/dsrd
lkcl
luke.leighton at gmail.com
Tue Oct 25 10:40:42 BST 2022
On October 25, 2022 2:39:18 AM GMT+01:00, Jacob Lifshay
<programmerjake at gmail.com> wrote:
>I would regard elwidth overrides as much less important for dsld/dsrd,
>they
>are mostly only useful for full 64-bit, sld/srd can be easily used when
>smaller elwidths are needed.
initially i thought so too, but then realised that sld/srd are
also single-reg value shift, where dsrd/ew=32 is still twin-regs
and can still be used to do bigint-shifting.
which doesn't "seem" to matter until you look at chacha20 hmac
which uses 17 byte arithmetic. everyone else rounds that up
to the nearest 32, 64 (or even 128-bit) integer arithmetic.
>> or svoffset) which begs the question: is it worthwhile to have
>> some form of special (non-orthogonal) behaviour involving
>> RC and the 9th bit of EXTRA which is free in EXTRA2 4-operand
>> form?
>
>how about just defining the 9th bit to instead make RB (or RA) be
>EXTRA3
>form for all 4-operand instructions? that is also useful for maddedu
>where
>RB (or RA) is the scalar multiplier and you need to specify successive
>registers in the vector/bigint you're multiplying by:
liked the principle: the example shows it can be done automatically,
no need to add extra assembler flags. the only problem being
[for maddedu] it destroys (fights) with the other use for the 9th bit:
selection of RS=RT+1 [RS=RT+MAXVL, scalar has MAXVL=1]
>> or, to attempt 3-operand EXTRA3 with 4 operands, treating the
>> shift source as mandatory scalar, for example?
>
>no, vector shift source is specifically needed for prefix-code
>encoding.
yeah i worked that out afterwards (doh), retrospectively.
my thoughts are here for prefix-code, just use sm=2 variant and
go into the loop with a copy of the shift-amount. then follow up
with a separate OR-reduction instruction.
(we are 100% categorically not going to be adding 4-in 1-out 64-bit
reg instructions, we had that conversation already. 3-in 2-out is
pushing the limit as it is, and is only justifiable because of RTp,
RTa, the very existence of VSX, and the LD-ST-update instructions).
l.
More information about the Libre-soc-dev
mailing list