[Libre-soc-dev] SVP64 Vectorised add-carry => big int add
lkcl
luke.leighton at gmail.com
Sun Apr 17 18:26:30 BST 2022
On Sun, Apr 17, 2022 at 2:08 PM lkcl <luke.leighton at gmail.com> wrote:
> see end of:
>
> https://libre-soc.org/openpower/sv/bitmanip/appendix/
>
> how about storing the 128-bit mul-add in a *pair* of vectors, 3-in 2-out just like DCT/FFT: RT and RT+VL
>
> a second followup instruction can perform the carry-adds with corrections.
could i ask people to check the math, here, i wrote it out:
https://libre-soc.org/openpower/sv/bitmanip/appendix/
i started from this:
# for big_c - big_a * word_b
result <- RC + ~(RA * RB) + CARRY
result_high <- HIGH_HALF(result)
if CARRY <= 1 then # unsigned comparison
result_high <- result_high + 1
end
CARRY <- result_high
RT <- LOW_HALF(result)
and, assuming the above is inserted into a SVP64 Vector for-loop,
performed a code-morph where {result} is separated out
into its own SVP64 Vector for-loop, storing a *pair* of 64-bit
result vectors into {RT} and {RS=RT+VL}
i then noted that
result <- RC + ~(RA * RB) + CARRY
=> result <- RC + ~(RA * RB) + 1 - 1 + CARRY
=> result <- RC - (RA * RB) + CARRY - 1
=> product <- RC - (RA * RB) and
result <- result + CARRY - 1
thus, all {products} can be separated into a standard mul-subtract
where top and bottom half of {products} are split into vectors
starting at {RT} and {RT+VL} - aka {RT} and {RS}
prod[0:127] = (RA) * (RB)
sub[0:127] = EXTZ(RC) - prod
RT <- sub[64:127]
RS <- sub[0:63]
a *second* instruction, slightly modified from jacob's original
to now include the "+1", performs the very-weird adds
cat[0:127] = (RB) || (RS)
sum[0:127] = cat + EXTZ(RA) + [1]*128
rhi[0:63] = sum[0:63]
if (RA) <= 1 then rhi = rhi + ([0]*63 || 1)
RA = rhi
RT = sum[64:127]
where this one uses (RA) as the CARRY from jacob's original, where
RA is an input *and* implicit output (like LD-ST-with-update), and
some minor weirdness has to be done on the register numbering
to use the intermediate results correctly
# RS=RT+VL, assume VL=8, therefore RS starts at r8.v
# q : r16
# dividend: r24.v
# divisor : r32.v
# carry : r40
li r40, 0
sv.msubx r0.v, r16, r24.v, r32.v
sv.weirdaddx r0.v, r40, r8.v
yes, both q (r16) and carry (r40) are scalar.
the reason for adding in the 1 into the mul-sub is to *make* msubx
a much more standard multiply-with-subtract. this stands a much
higher chance of acceptance into the spec than a weird "RA + ~(RB * RC)"
l.
More information about the Libre-soc-dev
mailing list