[Libre-soc-bugs] [Bug 865] implement vector bitmanip opcodes

Sat Jun 25 09:37:33 BST 2022

https://bugs.libre-soc.org/show_bug.cgi?id=865

--- Comment #23 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #22)

> > +    a1 = RA if mode&1 else ~RA
> 
> that's bitwise-not, not neg -- 

yes.  that's directly from the pseudocode explressions, which took me
a while to stop, it's so similar in small fonts.

https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#TBM_(Trailing_Bit_Manipulation)

XOP.LZ.09 01 /1 BLCFILL Fill from lowest clear bit      x & (x + 1)
XOP.LZ.09 02 /6 BLCI    Isolate lowest clear bit        x | ~(x + 1)
  XOP.LZ.09 01 /5       BLCIC   Isolate lowest clear bit and complement ~x & (x
+ 1)
  XOP.LZ.09 02 /1       BLCMSK  Mask from lowest clear bit      x ^ (x + 1)
  XOP.LZ.09 01 /3       BLCS    Set lowest clear bit    x | (x + 1)
  XOP.LZ.09 01 /2       BLSFILL Fill from lowest set bit        x | (x - 1)
  XOP.LZ.09 01 /6       BLSIC   Isolate lowest set bit n compl. ~x | (x - 1)
  XOP.LZ.09 01 /7       T1MSKC  Inverse mask from trailing ones ~x | (x + 1)
  XOP.LZ.09 01 /4       TZMSK   Mask from trailing zeros        ~x & (x - 1)

and, further up, BMI1

  VEX.LZ.0F38 F3 /3     BLSI    Extract lowest set isolated bit x & -x
  VEX.LZ.0F38 F3 /2     BLSMSK  Get mask up to lowest set bit   x ^ (x - 1)
  VEX.LZ.0F38 F3 /1     BLSR    Reset lowest set bit    x & (x - 1)

so this separates out 3 expression groups:

    1. x / ~x                    - this is a1
    2. & / ^ / |                 - this is mode3
    3. -x / x-1 / x+1 / ~(x+1)   - this is a2

however, on top of that, to get the same set-before-first, set-only-first
and set-including-first effect, an *additional* mask is added.

> I get you point anyway...

so relieved you can interpret fuzzy-logic :)

> The idea is that, currently add/subf/etc. are basically:
> 
> a = ~RA if subtracting else RA
> carry_in = 0
> if subtracting:
>     carry_in = 1
> RT = a + RB + carry_in

(and an output-invert)

if inverted_out:
    RT = ~RT

> bmask would (ignoring mask and bit-reverse and shifting) do:

mask is quite important (critical to include), and also i found
it... difficult to work out (sotto voice, i had to guess, and
eventually found it)

> a = ~RA if imm & 0b1 else RA
> b = 1 if imm & 0b10 else -1 # mode2
> carry_in = 0
> y = a + b + carry_in

ok so this calculates expression (3) is that correct? (with some
of the equivalence-conversions (~RA)+1 i believe it is) 

> v00 = 0
> v01 = v10 = bool(imm & 0b100)
> v11 = bool(imm & 0b1000)

ahh, a LUT2... it looks like... it's doing and/or/xor. so that's expression (2)

> # 64x 4-in muxes -- basically a binlog operation:
> # probably saves gates over muxing over and, or, and xor
> table = [v00, v01, v10, v11]
> RT = 0
> for i in range(64):
>     ra_bit = bool(RA & (1 << i))
>     y_bit = bool(y & (1 << i))
>     RT |= table[(ra_bit << 1) | y_bit] << i

and the ra input here is not expression (1) which is where the equivalence
chain falls over for me.

i *suspect* that if an extra bit for output-inversion is included then
that might work

as above:

    v00 = 0
    v01 = v10 = bool(imm & 0b100)
    v11 = bool(imm & 0b1000)

(out-inversion built-in to LUT2?)

    v00 ^= bool(imm^0b10000)
    v01 ^= bool(imm^0b10000)
    v10 ^= bool(imm^0b10000)
    v11 ^= bool(imm^0b10000)

-- 
You are receiving this mail because:
You are on the CC list for the bug.