[Libre-soc-bugs] [Bug 1155] O(n^2) multiplication REMAP mode(s)

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri Dec 22 09:50:15 GMT 2023


https://bugs.libre-soc.org/show_bug.cgi?id=1155

--- Comment #12 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #10)
> (In reply to Luke Kenneth Casson Leighton from comment #9)
> > (In reply to Jacob Lifshay from comment #7)
> > > I came up with a somewhat different bigint mul remap algorithm, it has a few
> > > issues though...I also rebased the branch on latest master.
> > > 
> > > commit a3c1930de7990c8babcb3908ed7650e1d08eafb6
> > > Author: Jacob Lifshay <programmerjake at gmail.com>
> > > Date:   Thu Dec 21 21:10:11 2023 -0800
> > > 
> > >     tests/bigint/powmod: initial version of bigint multiply remap
> > >     
> > >     it has some issues around being able to encode scalar RS but
> > >     vector RT, and doesn't match the scalar * vector multiplication
> > >     pattern, but is quite compact.
> > 
> > i was expecting / anticipating use of vertical-first, this is
> > different / cool, how does it work?
> 
> it is vertical first.

ahhh

> basically, I realized that, if you do it in the right sequence, you can
> compute a[:] * b[:] by just computing each partial product a[ai] * b[bi] and
> adding each one in the right place in the final result:
> 
> y = [0] * ...
> for ai, bi in right_sequence():
>     partial_product = a[ai] * b[bi]
>     y += partial_product << (word_size * (ai + bi))

... assuming y is a massive long "thing" of bitsize wordsize*(ai+bi)
but the subdivisions are over wordsize boundaries. i start to get it. neat.

> this becomes a vertical first loop:
> loop:
>     sv.maddedu *y, *a, *b, *y  # RS is stored in scalar t
>     sv.adde *(y + 1), *(y + 1), t
>     svstep.
>     sv.bdnz ... loop

briiilliant. i love it! that's the kind of laughably-short assembler
we need :)

> well, notice it builds the list of remap indexes, and then the loop that
> actually operates on the data just goes through those indexes in sequence,

see diff below, i split out those two roles explicitly separately,
as that's the next step in the code-morph sequence.

> doing (combination python and sv asm syntax):
> *y, t = maddedu(*a, *b, *y)
> *(y + 1), ca = adde(*(y + 1), t, ca)

love it.

> so the "yielder" (if you mean the code that produces remap indexes) is just
> the loops building the indexes, with:
> some_list.append(the_index)
> 
> replaced with:
> yield the_index.

yes. done.

   https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=aa6f10d1f2

that's how all the ISACaller REMAP yields work, but if there's
no yielder module (e.g. remapyield.py) then ISACaller can't *have* that
functionality and we can't do the tests :)

okaay next task now we have the actual algorithm python_mul_remap_yielder(a_sz,
b_sz)
is to work out how it's going to fit into SVSHAPE0-3.
that process is started at comment #2

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list