[Libre-soc-dev] SVP64 Vectorised add-carry => big int add
programmerjake at gmail.com
Tue Apr 19 11:12:44 BST 2022
it occurred to me that, assuming we want less than 4-in 2-out, we could
have 3-in 2-out where the instruction is just a bigint times word instead
of a bigint times word plus bigint.
so, what we'd end up with:
bigint + bigint -> bigint:
bigint - bigint -> bigint:
bigint * word -> bigint:
mule RT, RA, RB:
prod = RA * RB + CARRY # 64-bit * 64-bit + 64-bit -> 128-bit
RT = LOW_HALF(prod)
CARRY = HIGH_HALF(prod)
the div inner loop would end up as:
# vn is in r32, qhat is in r3, un is in r64
li r0, 0
mtspr CARRY, r0 # clear carry for multiplication
subfc r0, r0, r0 # set CY for subtraction
sv.mule r96.v, r32.v, r3.s # r96... = r32... * r3
sv.sube r64.v, r64.v, r96.v # r64... = r64... - r96...
the mul inner loop would be similar: a sv.mule followed by sv.adde.
because of how it's defined, sv.mule can benefit from the same 256-bit *
64-bit -> 320-bit multiplier optimization, also, because it only has the
one output vector (unlike mulx), it can be much more easily fused with
sv.adde/sv.subfe if desired.
More information about the Libre-soc-dev