[Libre-soc-bugs] [Bug 782] add galois field bitmanip instructions

Sat Mar 5 01:11:23 GMT 2022

https://bugs.libre-soc.org/show_bug.cgi?id=782

--- Comment #22 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #19)
> (In reply to Jacob Lifshay from comment #17)
> > (In reply to Jacob Lifshay from comment #16)
> > > NTT inner loop:
> > > https://github.com/sympy/sympy/blob/68022011872a20c27abd047125e5feb8ce3289cc/sympy/discrete/transforms.py#L173
> > > 
> > >     h = 2
> > >     while h <= n:
> > >         hf, ut = h // 2, n // h
> > >         for i in range(0, n, h):
> > >             for j in range(hf):
> > >                 u, v = a[i + j], a[i + j + hf]*w[ut * j]
> > >                 a[i + j], a[i + j + hf] = (u + v) % p, (u - v) % p
> > >         h *= 2
> 
> nice find.  i wish i had been able to track that down as quickly,
> it took me 2-3 weeks to derive that same algorithm from project
> nayuki recursive version, sigh

it was one of the first few things i found on google...

> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD
> 
> so, the twin thingy means no need to have a temp var,
> because it is a butterfly swap on a[i+j] and a[i+j+hf]
> 
> what i did in SV.REMAP, was, create 3 separate "schedules":
> 
> * one for i+j
> * one for i+j+hf
> * one for... where is it... ah: ut+j
> 
> > looks like gfpmuladdsub
> 
> yes, basically you set up gfpmuladdsub(RT, RA, RB)
> and apply *two* SV RM destinations to RT (one for the add, one
> for the sub), then hit:

hmm, I think we should instead defined the scalar gfpmuladdsub to write to two
separate registers, like how ldu works, rather than having to use svp64 to
disambiguate the two destinations (which requires using svp64 for the
instruction to be useful since the scalar instruction writes twice to the same
register).

Also, naming wise, that subtraction is term - product, which is the reverse of
normal muladd, so i think we should call it gfpmuladdsubr rather than
gfpmuladdsub.

I propose:
gfpmuladdsubr RT, RA, RB, RC

Pseudocode:
factor1 = (RA)
factor2 = (RB)
term = (RC)
prime = GFPRIME # the gfp* prime modulus SPR
(RA) = (term - factor1 * factor2) % prime
(RT) = (term + factor1 * factor2) % prime

This allows SVP64 REMAP to have:
* `a[i + j]` be RT and RC
* `a[i + j + hf]` be RA
* `w[ut * j]` be RB

Inner loop body for reference:
> u, v = a[i + j], a[i + j + hf]*w[ut * j]
> a[i + j], a[i + j + hf] = (u + v) % p, (u - v) % p

I picked gfpmuladdsubr overwriting one of the factors (RA) because that makes
it more flexible since there's another identical input (RB) if you need your
factor to not be overwritten, whereas you can't do that with the term input
(RC) cuz there's only one of them, so I had that result go to RT instead of
overwriting the term input.

-- 
You are receiving this mail because:
You are on the CC list for the bug.