[Libre-soc-bugs] [Bug 1155] O(n^2) multiplication REMAP mode(s)

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Tue Feb 20 05:30:20 GMT 2024


https://bugs.libre-soc.org/show_bug.cgi?id=1155

--- Comment #72 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #66)
> (In reply to Jacob Lifshay from comment #65)
> 
> > just set SVSHAPE3's offset to -1, which puts RS in RT - 1, 
> 
> i love obscure tricks like this, esp. when i don't understand thm :)

well, perfect: I said we don't need signed offset, but turns out we still do.

Me saying we don't need signed offsets:
https://bugs.libre-soc.org/show_bug.cgi?id=911#c5

switching to 4-bit signed offsets (instead of unsigned) should be doable, since
most applications of offsets can just reverse the trick mentioned in bug #911
comment #5.

here, we *do* need signed offsets since the base register is RC and we're
offsetting RS, so we can't change the base register. if we don't have
signed-offsets that limits the maximum product size to 14 registers, since we
have to put the RS result outside of the product and the max offset we can set
is 15.

to be honest, I'm thinking it would be better to just use the obvious algorithm
from comment #0 (reproduced below in case comment #0 is edited) instead of
trying to cram it all in one vertical first loop, since that VF-loop has very
annoying dependency chains through CA, and because the algorithm from comment
#0 also lets us have a bigint mul-add (just don't clear product, voila, an
additional add input!), saving even more time. Also, the algorithm from comment
#0 lets us use the exact same pattern (the bi loop) as a scalar-vector mul
using horizontal first sv.maddedu, likely saving CPU complexity too.

Algorithm from comment #0:
void mul(
    uint64_t *product,
    const uint64_t *a,
    const uint64_t *b,
    size_t a_sz,
    size_t b_sz
) {
    for(size_t bi = 0; bi < a_sz; bi++) {
        product[bi] = 0;
    }
    for(size_t ai = 0; ai < a_sz; ai++) {
        uint64_t carry = 0;
        for(size_t bi = 0; bi < a_sz; bi++) {
            uint128_t v = (uint128_t)a[ai] * (uint128_t)b[bi];
            v += (uint128_t)carry;
            v += (uint128_t)product[ai + bi];
            carry = v >> 64;
            product[ai + bi] = (uint64_t)v;
        }
        product[ai + b_sz] = carry;
    }
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list