# [Libre-soc-bugs] [Bug 1044] SVP64 implementation of pow(x,y,z)

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Wed Mar 29 00:48:54 BST 2023

```https://bugs.libre-soc.org/show_bug.cgi?id=1044

--- Comment #4 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Jacob Lifshay from comment #2)
> > rsa uses a non-prime modulus, changing title
>
> ah good catch i totally missed that.
>
> need to find algorithms now, shorter more compact
> abd fitting into as few instructions as possible
> without being overly long completion time is fine.

how long is overly long? the reason i'm using fast mul algorithms is because
they are closer to O(n^(1+ε)) rather than O(n^2), e.g. even toom-2 (karatsuba)
multiplication is O(n^1.5...) so on 4096-bit mul is very approximately 512
64-bit mul ops, whereas naive O(n^2) is about 4096 64-bit mul ops. toom-3 and
higher are even less than that.

if we use an algorithm as our demo that is known to be >4x slower, even if our
cpu has a 256-bit simd mul unit, it will take approximately 4x as much power
for the same speed or 4x as long for the same power without the simd unit, so
everyone will see our proposal as some kind of bad joke, or that we're just
noobs who don't know what their doing.

the approximately 4000 instruction version of fast mul is what you get when you
recursively inline the functions, if we instead leave them as uninlined
recursive functions the amount of code used is substantially smaller.

in any case we'll still want montgomery multiplication (which is a relatively
simple algorithm that we could just code in c to call our bigint mul
algorithm), which transforms modular exponentiation from needing a 8192-bit
bigint divmod at each step to only needing it once at the end, saving a
substantial amount of time.

--
You are receiving this mail because:
You are on the CC list for the bug.
```