[Libre-soc-bugs] [Bug 782] add galois field bitmanip instructions

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Sat Mar 5 00:29:38 GMT 2022


https://bugs.libre-soc.org/show_bug.cgi?id=782

--- Comment #19 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #17)
> (In reply to Jacob Lifshay from comment #16)
> > NTT inner loop:
> > https://github.com/sympy/sympy/blob/68022011872a20c27abd047125e5feb8ce3289cc/
> > sympy/discrete/transforms.py#L173
> > 
> >     h = 2
> >     while h <= n:
> >         hf, ut = h // 2, n // h
> >         for i in range(0, n, h):
> >             for j in range(hf):
> >                 u, v = a[i + j], a[i + j + hf]*w[ut * j]
> >                 a[i + j], a[i + j + hf] = (u + v) % p, (u - v) % p
> >         h *= 2

nice find.  i wish i had been able to track that down as quickly,
it took me 2-3 weeks to derive that same algorithm from project
nayuki recursive version, sigh

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD

so, the twin thingy means no need to have a temp var,
because it is a butterfly swap on a[i+j] and a[i+j+hf]

what i did in SV.REMAP, was, create 3 separate "schedules":

* one for i+j
* one for i+j+hf
* one for... where is it... ah: ut+j

> looks like gfpmuladdsub

yes, basically you set up gfpmuladdsub(RT, RA, RB)
and apply *two* SV RM destinations to RT (one for the add, one
for the sub), then hit:

* RTv1 with an i+j schedule
* RTv2 with an i+j+hf schedule
* RB which is the multiply coefficient array w[] with ut*j

have to look again at the fft svp64 example because i am missing
something there but basically that entire triple for-loop,
the while plus i plus j, all "disappear" into one single
application of *THREE* SV.REMAPs on the one single
sv.gfpmuladdsub instruction.

kinda stunningly ridiculously elegant and overkill at the
same time :)

but, joking aside, it means you can do the entire in-place
NTT (or DFT) triple loop in something mad like the equivalent
space of 5 32 bit instructions.

remap_fft_yield.py shows the principle.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list