[Libre-soc-dev] SVP64 Vectorised add-carry => big int add

Wed Apr 13 01:20:08 BST 2022

On April 12, 2022 4:00:25 AM UTC, Jacob Lifshay <programmerjake at gmail.com> wrote:

>See table 2 in:
>http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-large-integer-arithmetic-paper.pdf
>for an example of why two separate carry chains are needed...the
>example
>algorithm they use is 512-bit multiplication.

making sure there some notes from the meeting tonight: it occurred to me that for these bigint computations, div and mul, it is a false assumption that it is absolutely necessary to have and to use 64 bit to create or synthesise 128 bit from src or dest by messing about with multiple instruction sequences.

due to the vectorisation, we can in fact use *32 bit* operations to produce *64 bit* results where the adds, by virtue of being vectorised, make absolutely no odds  because they are carry-chained into a longer (bigint) result anyway.  the only difference is, VL has to be doubled.

likewise for long-div 64 bit operands can be used, producing 32 bit results, those can be subtracted, who cares if VL is twice as long: the backend hardware can SIMDify everything anyway.

it would even be possible for really high performance systems to macro-op fuse all of these operations together into backend SIMD engines that do 128 bit, although honestly i suspect it's actually more efficient to do them at 32 bit.

all quite fascinating.

l.