[Libre-soc-dev] SVP64 Vectorised add-carry => big int add

Mon Apr 18 19:27:22 BST 2022

On Mon, Apr 18, 2022, 09:53 lkcl <luke.leighton at gmail.com> wrote:

> but then how to calculate t?  i have no idea, i need help here.

I changed the code to have the original loop, a sub-mul-borrow loop
(basically what the original loop does, but with different var names and
the vn[j + n] stuff converted to a final iteration), and the mrsubcarry
algorithm I gave.

afaict mrsubcarry is likely the most efficient in hardware, as explained
earlier.

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=37b0381ed51ceaeff119910ecba382884c443740

microarchitecturally, the maddcarry/mrsubcarry instructions can be run
efficiently by:

assuming a 4x64-bit simd unit with the merged 64x256->320-bit multiplier
have the VL loop loop in chunks of 4 iterations, each chunk is dispatched
to the whole simd unit as a single 256-bit-wide instruction (data, not insn
encoding), the carry dependency is only tracked between those
chunk-instructions. the carry out is the top 64-bits from the multiplier
(adjusted as needed), the carry in from the previous chunk is just fed into
the multiplier's carry-save-addition-tree as another 64-bit term. assuming
the multiplier takes 2 cycles latency, then you get 128-bits of result per
cycle of throughput.

Jacob