[Libre-soc-dev] SVP64 Vectorised add-carry => big int add

Sun Apr 17 18:26:30 BST 2022

On Sun, Apr 17, 2022 at 2:08 PM lkcl <luke.leighton at gmail.com> wrote:

> see end of:
>
> https://libre-soc.org/openpower/sv/bitmanip/appendix/
>
> how about storing the 128-bit mul-add in a *pair* of vectors, 3-in 2-out just like DCT/FFT: RT and RT+VL
>
> a second followup instruction can perform the carry-adds with corrections.

could i ask people to check the math, here, i wrote it out:
https://libre-soc.org/openpower/sv/bitmanip/appendix/

i started from this:

    # for big_c - big_a * word_b
    result <- RC + ~(RA * RB) + CARRY
    result_high <- HIGH_HALF(result)
    if CARRY <= 1 then # unsigned comparison
        result_high <- result_high + 1
    end
    CARRY <- result_high
    RT <- LOW_HALF(result)

and, assuming the above is inserted into a SVP64 Vector for-loop,
performed a code-morph where {result} is separated out
into its own SVP64 Vector for-loop, storing a *pair* of 64-bit
result vectors into {RT} and {RS=RT+VL}

i then noted that

      result <- RC + ~(RA * RB) + CARRY
=>  result <- RC + ~(RA * RB) + 1 - 1 + CARRY
=> result <- RC - (RA * RB) + CARRY - 1
=> product <- RC - (RA * RB) and
     result <- result + CARRY - 1

thus, all {products} can be separated into a standard mul-subtract
where top and bottom half of {products} are split into vectors
starting at {RT} and {RT+VL} - aka {RT} and {RS}

    prod[0:127] = (RA) * (RB)
    sub[0:127] = EXTZ(RC) - prod
    RT <- sub[64:127]
    RS <- sub[0:63]

a *second* instruction, slightly modified from jacob's original
to now include the "+1", performs the very-weird adds

    cat[0:127] = (RB) || (RS)
    sum[0:127] = cat + EXTZ(RA) + [1]*128
    rhi[0:63] = sum[0:63]
    if (RA) <= 1 then rhi = rhi + ([0]*63 || 1)
    RA = rhi
    RT = sum[64:127]

where this one uses (RA) as the CARRY from jacob's original, where
RA is an input *and* implicit output (like LD-ST-with-update), and
some minor weirdness has to be done on the register numbering
to use the intermediate results correctly

    # RS=RT+VL, assume VL=8, therefore RS starts at r8.v
    # q       : r16
    # dividend: r24.v
    # divisor : r32.v
    # carry   : r40
    li r40, 0
    sv.msubx r0.v, r16, r24.v, r32.v
    sv.weirdaddx r0.v, r40, r8.v

yes, both q (r16) and carry (r40) are scalar.

the reason for adding in the 1 into the mul-sub is to *make* msubx
a much more standard multiply-with-subtract.  this stands a much
higher chance of acceptance into the spec than a weird "RA + ~(RB * RC)"

l.