[Libre-soc-dev] bigmul REMAP and pow(x,y,mod)

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sat Sep 30 17:17:22 BST 2023


>  (sorry, send hit by accident)

---------- Forwarded message ----------

jacob managed to get a simple bigint modulo power(x,y) working
the other day, which is a great milestone. also finding a bug in mcrxrx
was important, and raised the issue that CR ops still need to be
evaluated and implemented as SV Prefixed.
https://bugs.libre-soc.org/show_bug.cgi?id=1044

the next step is to look at the first revsion of pow(x,y,mod) and
also the bigint multiply which uses sv.maddedu and sv.addex and
see how to create a bigmul REMAP.
https://bugs.libre-soc.org/show_bug.cgi?id=1155

now, please bear with me, i have been going through some options
for a couple of weeks now. the patterns needed are *not just*
bigmul, but triangular as well, most notably because these patterns
occur in e.g. MPEG motion estimation, i saw this in the work that
Konstantinos did

xxxx  xxxx    1234
xxx   xxxx   4123
 xx    xxxx  3412
  x     xxxx 2341

these need to be rotatable, mirrorable and reversible (!) as they are
general-purpose. in powmod.py i have been slowly morphing
python_mul_algorithm2 to make it best use REMAP. not having
any of it.
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bigint/powmod.py;hb=HEAD

you can see the problem:

116     for iy in range(4):
117         for i in range(4): # use t[iy+4] as a 64-bit carry
118             t[iy+i], t[iy+4] = maddedu(a[iy], b[i], t[iy+4])
119         ca = 0
120         for i in range(5): # add vec t to y with 1-bit carry
121             idx = iy + i
122             y[idx], ca = adde(y[idx], t[idx], ca)

if that was

116     for iy in range(4):
117         for i in range(4): # use t[iy+4] as a 64-bit carry
118             t[iy+i], t[iy+4] = maddedu(a[iy], b[i], t[iy+4])
        for iy in range(4):
119         ca = 0
120         for i in range(5): # add vec t to y with 1-bit carry
121             idx = iy + i
122             y[idx], ca = adde(y[idx], t[idx], ca)


then no problem at all, just have one REMAP for lines 116-118,
and a separate REMAP mode for the rest.

but it is the very fact that there are *two* separate inner
loops, nested within an outer one, that is the core of the
problem.

i thought about a special REMAP that does the 4 then the 5 for
an inner loop, but this is awful.

then i realised: go back to using addc()

 89     for i in range(4):
 90         t[i], t[4] = maddedu(a[3], b[i], t[4])
 91     y[3], ca = addc(y[3], t[0])
 92     for i in range(4):
 93         y[4 + i], ca = adde(y[4 + i], t[1 + i], ca)

first thing is, the inner loop on i is the same size, which
means you can do this:

    for m_or_a in range(2):
      if m_or_a == 0:
        for i in range(4):
            t[i], t[4] = maddedu(a[3], b[i], t[4])
      else:
        y[3], ca = addc(y[3], t[0])
        for i in range(4):
            y[4 + i], ca = adde(y[4 + i], t[1 + i], ca)

and then the 3 loops become:

   for j in range(4):
       for m_or_a in range(2):
           for i in range(4):

and *that* is purely a 3D REMAP, and we have plenty of those
already.

now, the next problem to solve is fitting it into instructions,
and for that we have run out of space in svshape.  a svshape3
is needed basically.

l.


-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


More information about the Libre-soc-dev mailing list