# [Libre-soc-isa] [Bug 937] instructions for bigint shift and prefix-code encode

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Mon Sep 26 00:53:06 BST 2022

```https://bugs.libre-soc.org/show_bug.cgi?id=937

--- Comment #2 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #1)
> with the shift-up placing zeros into the LSBs if RC>0
> can you explain clearly why the following alteration
> is not correct for pcdec?

assuming you mean pcenc

>
>     elif RC < 0:
>         v <<= -RC
>         return u64(v>>64)  # return hi 64-bits
>     else:
>         v >>= RC
>         return u64(v)  # return low 64-bits

it turns out that signed shift amounts aren't actually necessary for the pcenc
algorithm as written above, i thought they were necessary because I was
thinking of an alternative algorithm that i never got fully working.

> it will need to be an overwrite (RT-as-source) in order to get it into
> RM-1P-2S1D as dshrshd RT,RA,RB so that EXTRA3 may be used.  as 3-in 1-out
> RT,RA,RB,RC there is no room for EXTRA3 it would be EXTRA2 which cannot
> handle odd vector regnums except through svoffset

as mentioned on irc:
https://libre-soc.org/irclog/%23libre-soc.2022-09-25.log.html#t2022-09-25T23:24:46
imho we should have a few variants if we need overwrite, since that makes it
more flexible:
> yes, it can be an overwrite variant, imho if we do that we should provide
> several variants for each input we overwrite: e.g. RT = op(RT, RA, RB),
> RT= op(RA, RT, RB), RT = op(RA, RB, RT), RT=op(0, RA, RB), RT=op(RA, 0, RB)

also on irc:
> for pcenc it has to reduce into several dynamically-determined outputs, so just a traditional mapreduce won't work

After writing that, I thought of a algorithm where we could use 3 prefix-sums
for the whole function, rather than a separate reduction for every output
dword:
(again, code not tested)

def pcenc(symbols, symbol_table):
""" prefix-codes encode
symbols: list[int]
symbol indexes
symbol_table: list[tuple[int, int]]
pairs of bits and bit lengths for symbols
Returns: tuple[list[int], int]
the output 64-bit words, and the total number of output bits.
"""
# gather-load and shifts and ands to extract fields:
sym_bits = [0] * len(symbols)
sym_len = [0] * len(symbols)
for i in range(len(symbols)):
sym_bits[i], sym_len[i] = symbol_table[symbols[i]]

# prefix-sum:
start = [0] * (len(symbols) + 1)
for i in range(len(symbols)):
start[i + 1] = start[i] + sym_len[i]

# shift:
shifted0 = [0] * (len(symbols) + 1)
for i in range(len(symbols)):
shifted0[i + 1] = u64(sym_bits[i] << (start[i] % 64))
shifted1 = [0] * (len(symbols) + 1)
for i in range(len(symbols)):
shifted1[i + 1] = (sym_bits[i] << (start[i] % 64)) >> 64

# for pedagogical purposes, not needed in final algorithm:
# orig_shifted0 = shifted0.copy()
# orig_shifted1 = shifted1.copy()

# xor prefix-sum (can't use bitwise-or because it's not invertible):
for i in range(len(symbols)):
shifted0[i + 1] ^= shifted0[i]
for i in range(len(symbols)):
shifted1[i + 1] ^= shifted1[i]

# scatter or twin-pred:
out_to_in_map = [0] * (start[-1] // 64 + 1)
for i in range(len(start)):
out_to_in_map[start[i] // 64] = i

# gather and xor:
out = [0] * len(out_to_in_map)
for i in range(len(out_to_in_map) - 1):
# extract a subsequence of orig_shifted0 by xor-ing the
# start/end of the subsequence from the xor prefix-summed sequence
out[i] = shifted0[out_to_in_map[i]]
out[i] ^= shifted0[out_to_in_map[i + 1]]
for i in range(1, len(out_to_in_map)):
# extract a subsequence of orig_shifted0 by xor-ing the
# start/end of the subsequence from the xor prefix-summed sequence
out[i] ^= shifted1[out_to_in_map[i - 1]]
out[i] ^= shifted1[out_to_in_map[i]]
return out, start[-1]

--
You are receiving this mail because:
You are on the CC list for the bug.
```