[Libre-soc-isa] [Bug 937] instructions for bigint shift and prefix-code encode

Mon Sep 26 00:53:06 BST 2022

https://bugs.libre-soc.org/show_bug.cgi?id=937

--- Comment #2 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #1)
> with the shift-up placing zeros into the LSBs if RC>0
> can you explain clearly why the following alteration
> is not correct for pcdec?

assuming you mean pcenc

> 
>     elif RC < 0:
>         v <<= -RC
>         return u64(v>>64)  # return hi 64-bits
>     else:
>         v >>= RC
>         return u64(v)  # return low 64-bits

it turns out that signed shift amounts aren't actually necessary for the pcenc
algorithm as written above, i thought they were necessary because I was
thinking of an alternative algorithm that i never got fully working.

> it will need to be an overwrite (RT-as-source) in order to get it into
> RM-1P-2S1D as dshrshd RT,RA,RB so that EXTRA3 may be used.  as 3-in 1-out
> RT,RA,RB,RC there is no room for EXTRA3 it would be EXTRA2 which cannot
> handle odd vector regnums except through svoffset

as mentioned on irc:
https://libre-soc.org/irclog/%23libre-soc.2022-09-25.log.html#t2022-09-25T23:24:46
imho we should have a few variants if we need overwrite, since that makes it
more flexible:
> yes, it can be an overwrite variant, imho if we do that we should provide
> several variants for each input we overwrite: e.g. RT = op(RT, RA, RB),
> RT= op(RA, RT, RB), RT = op(RA, RB, RT), RT=op(0, RA, RB), RT=op(RA, 0, RB)

also on irc:
> for pcenc it has to reduce into several dynamically-determined outputs, so just a traditional mapreduce won't work

After writing that, I thought of a algorithm where we could use 3 prefix-sums
for the whole function, rather than a separate reduction for every output
dword:
(again, code not tested)

def pcenc(symbols, symbol_table):
    """ prefix-codes encode
    symbols: list[int]
        symbol indexes
    symbol_table: list[tuple[int, int]]
        pairs of bits and bit lengths for symbols
    Returns: tuple[list[int], int]
        the output 64-bit words, and the total number of output bits.
    """
    # gather-load and shifts and ands to extract fields:
    sym_bits = [0] * len(symbols)
    sym_len = [0] * len(symbols)
    for i in range(len(symbols)):
        sym_bits[i], sym_len[i] = symbol_table[symbols[i]]

    # prefix-sum:
    start = [0] * (len(symbols) + 1)
    for i in range(len(symbols)):
        start[i + 1] = start[i] + sym_len[i]

    # shift:
    shifted0 = [0] * (len(symbols) + 1)
    for i in range(len(symbols)):
        shifted0[i + 1] = u64(sym_bits[i] << (start[i] % 64))
    shifted1 = [0] * (len(symbols) + 1)
    for i in range(len(symbols)):
        shifted1[i + 1] = (sym_bits[i] << (start[i] % 64)) >> 64

    # for pedagogical purposes, not needed in final algorithm:
    # orig_shifted0 = shifted0.copy()
    # orig_shifted1 = shifted1.copy()

    # xor prefix-sum (can't use bitwise-or because it's not invertible):
    for i in range(len(symbols)):
        shifted0[i + 1] ^= shifted0[i]
    for i in range(len(symbols)):
        shifted1[i + 1] ^= shifted1[i]

    # scatter or twin-pred:
    out_to_in_map = [0] * (start[-1] // 64 + 1)
    for i in range(len(start)):
        out_to_in_map[start[i] // 64] = i

    # gather and xor:
    out = [0] * len(out_to_in_map)
    for i in range(len(out_to_in_map) - 1):
        # extract a subsequence of orig_shifted0 by xor-ing the
        # start/end of the subsequence from the xor prefix-summed sequence
        out[i] = shifted0[out_to_in_map[i]]
        out[i] ^= shifted0[out_to_in_map[i + 1]]
    for i in range(1, len(out_to_in_map)):
        # extract a subsequence of orig_shifted0 by xor-ing the
        # start/end of the subsequence from the xor prefix-summed sequence
        out[i] ^= shifted1[out_to_in_map[i - 1]]
        out[i] ^= shifted1[out_to_in_map[i]]
    return out, start[-1]

-- 
You are receiving this mail because:
You are on the CC list for the bug.