[Libre-soc-dev] bigmul REMAP and pow(x,y,mod)
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sat Sep 30 17:17:22 BST 2023
> (sorry, send hit by accident)
---------- Forwarded message ----------
jacob managed to get a simple bigint modulo power(x,y) working
the other day, which is a great milestone. also finding a bug in mcrxrx
was important, and raised the issue that CR ops still need to be
evaluated and implemented as SV Prefixed.
https://bugs.libre-soc.org/show_bug.cgi?id=1044
the next step is to look at the first revsion of pow(x,y,mod) and
also the bigint multiply which uses sv.maddedu and sv.addex and
see how to create a bigmul REMAP.
https://bugs.libre-soc.org/show_bug.cgi?id=1155
now, please bear with me, i have been going through some options
for a couple of weeks now. the patterns needed are *not just*
bigmul, but triangular as well, most notably because these patterns
occur in e.g. MPEG motion estimation, i saw this in the work that
Konstantinos did
xxxx xxxx 1234
xxx xxxx 4123
xx xxxx 3412
x xxxx 2341
these need to be rotatable, mirrorable and reversible (!) as they are
general-purpose. in powmod.py i have been slowly morphing
python_mul_algorithm2 to make it best use REMAP. not having
any of it.
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bigint/powmod.py;hb=HEAD
you can see the problem:
116 for iy in range(4):
117 for i in range(4): # use t[iy+4] as a 64-bit carry
118 t[iy+i], t[iy+4] = maddedu(a[iy], b[i], t[iy+4])
119 ca = 0
120 for i in range(5): # add vec t to y with 1-bit carry
121 idx = iy + i
122 y[idx], ca = adde(y[idx], t[idx], ca)
if that was
116 for iy in range(4):
117 for i in range(4): # use t[iy+4] as a 64-bit carry
118 t[iy+i], t[iy+4] = maddedu(a[iy], b[i], t[iy+4])
for iy in range(4):
119 ca = 0
120 for i in range(5): # add vec t to y with 1-bit carry
121 idx = iy + i
122 y[idx], ca = adde(y[idx], t[idx], ca)
then no problem at all, just have one REMAP for lines 116-118,
and a separate REMAP mode for the rest.
but it is the very fact that there are *two* separate inner
loops, nested within an outer one, that is the core of the
problem.
i thought about a special REMAP that does the 4 then the 5 for
an inner loop, but this is awful.
then i realised: go back to using addc()
89 for i in range(4):
90 t[i], t[4] = maddedu(a[3], b[i], t[4])
91 y[3], ca = addc(y[3], t[0])
92 for i in range(4):
93 y[4 + i], ca = adde(y[4 + i], t[1 + i], ca)
first thing is, the inner loop on i is the same size, which
means you can do this:
for m_or_a in range(2):
if m_or_a == 0:
for i in range(4):
t[i], t[4] = maddedu(a[3], b[i], t[4])
else:
y[3], ca = addc(y[3], t[0])
for i in range(4):
y[4 + i], ca = adde(y[4 + i], t[1 + i], ca)
and then the 3 loops become:
for j in range(4):
for m_or_a in range(2):
for i in range(4):
and *that* is purely a 3D REMAP, and we have plenty of those
already.
now, the next problem to solve is fitting it into instructions,
and for that we have run out of space in svshape. a svshape3
is needed basically.
l.
--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the Libre-soc-dev
mailing list