[Libre-soc-dev] SVP64 Vectorised add-carry => big int add
programmerjake at gmail.com
Mon Apr 18 19:27:22 BST 2022
On Mon, Apr 18, 2022, 09:53 lkcl <luke.leighton at gmail.com> wrote:
> but then how to calculate t? i have no idea, i need help here.
I changed the code to have the original loop, a sub-mul-borrow loop
(basically what the original loop does, but with different var names and
the vn[j + n] stuff converted to a final iteration), and the mrsubcarry
algorithm I gave.
afaict mrsubcarry is likely the most efficient in hardware, as explained
microarchitecturally, the maddcarry/mrsubcarry instructions can be run
assuming a 4x64-bit simd unit with the merged 64x256->320-bit multiplier
have the VL loop loop in chunks of 4 iterations, each chunk is dispatched
to the whole simd unit as a single 256-bit-wide instruction (data, not insn
encoding), the carry dependency is only tracked between those
chunk-instructions. the carry out is the top 64-bits from the multiplier
(adjusted as needed), the carry in from the previous chunk is just fed into
the multiplier's carry-save-addition-tree as another 64-bit term. assuming
the multiplier takes 2 cycles latency, then you get 128-bits of result per
cycle of throughput.
More information about the Libre-soc-dev