[Libre-soc-dev] dsld/dsrd

Tue Oct 25 02:39:18 BST 2022

manually composing my reply email since libre-soc never sent me the
original email.

> today i tried making dsld/dsrd 4-operand.  it went really well,
> right up to the point i ran the bigint vector tests at which point
> it all went to shit.
>
> 4-op is EXTRA2 which in this case prohibits specifying a pair
> of operands for consecutive HI-LO operand selection as to
> the 128-bit source to be shifted.
>
> the only workaround is to either use svoffset or take a copy
> of the entire source vector in order to shift the HI source up
> by one register.
>
> for elwidth overrides that actually will have to be done (copy

I would regard elwidth overrides as much less important for dsld/dsrd, they
are mostly only useful for full 64-bit, sld/srd can be easily used when
smaller elwidths are needed.

> or svoffset) which begs the question: is it worthwhile to have
> some form of special (non-orthogonal) behaviour involving
> RC and the 9th bit of EXTRA which is free in EXTRA2 4-operand
> form?

how about just defining the 9th bit to instead make RB (or RA) be EXTRA3
form for all 4-operand instructions? that is also useful for maddedu where
RB (or RA) is the scalar multiplier and you need to specify successive
registers in the vector/bigint you're multiplying by:

# multiply r70..r77 by r80..87, storing result in r100..115
# a 512-bit simple O(n^2) multiply.
# can easily be forced to need high registers because lower registers
# are already used by other values in a larger algorithm containing 512-bit
mul.
setvl VL=8
sv.li *108, 0
sv.li 32, 0
sv.maddedu *100, *70, 80, 32 # extra2 can't address r108, so use r32
sv.mv 108, 32
.set i, 0
.rept 7  # using gnu assembler repeat loop
.set i, i + 1
li 32, 0
sv.maddedu *60, *70, 80+i, 32 # uses RB=81,82,83,84...87
addic 0, 0, 0 # clear CA
sv.adde *100+i, *100+i, *60
sv.adde 108+i, 108+i, 32
.endr
>
> or, to attempt 3-operand EXTRA3 with 4 operands, treating the
> shift source as mandatory scalar, for example?

no, vector shift source is specifically needed for prefix-code encoding.

Jacob