[Libre-soc-dev] ternlog & grevlut again -- was: next ISA WG RFC

Mon Mar 6 19:18:57 GMT 2023

On Mon, Mar 6, 2023, 10:45 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

>
>
> On Monday, March 6, 2023, Jacob Lifshay <programmerjake at gmail.com> wrote:
> > On Mon, Mar 6, 2023, 04:10 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
> >>
> >> folks we need to discuss what RFCs should go in next, and plan
> >> groupings
> >> https://libre-soc.org/openpower/sv/bitmanip/
> >>
> >> my recommendation is to not go above about 5-7 instructions
> >> per RFC, and to group them.  candidates:
> >>
> >> * ternlogi, crternlogi, binlut, crbinlut
> >
> > these are a good choice to submit next with mostly obvious benefit,
> though we might be able to squeeze an extra bit out of ternlogi's immediate
> by deleting the redundant encodings already covered by li, and, or, xor,
> mv, etc. it seems worth trying and seeing how complex that would be. we can
> also just decide redundancy is ok and simplicity is worth the extra
> encoding bit. maybe that should be an unresolved question that the ISA WG
> can answer.
>
> no.  the Power ISA decoder is ridiculously complex as it is.
> POWER9 has a 2 stage decoder which is ridiculous
>

the extra logic would go in the ternlog ALU, not the decoder.

>
> >>
> >> * average-sum-diff and abs-accumulate, useful for AV
> >
> > pretty good, but imho ternlog is more compelling since av insns already
> exist in vsx
>
> except we're not doing VSX. the case for adding them as scalar
> is based on SVP64, these being a stepping stone.
>
> > and, without vectorization of some sort, are not very beneficial
>
> hence why SVP64 was put in as the very frst RFC.
>

yeah, ok, i forgot that described SVP64 and wasn't only reserving opcode
space.

>
> >>
> >> * grevlut, xperm, bitmatrix
> >
> > imho grevlut still has the major problem of using a huge amount of
> encoding space for not much benefit, i think it can be greatly simplified
> while retaining nearly all the practical benefit,
>
> you tried once already and dramatically reduced the capability
> (to a fraction of grevlut) which tells me that rather than the
> instruction being "not much benefit" you don't quite understand
> how powerful it is.
>

well actually the only justification i remember you giving for it not being
based on and-or gates is that muxes give more possible immediates of the
style you checked for, but icr if the and-or one uses less immediate bits...

>
> that said because it is so innovative and new there simply
> hasn't been any analysis done, no use-cases except that gorc
> grev etc can be covered by it (like ternlog covers crand etc),
> this will itself make it difficult to justify inclusion,
> until that research is done.  good thing there's an NLnet Grant
> milestone for exavtly that, eh? :)
>

well, imho that's just as good a reason to do the same research on the
and-or variant. my intuition tells me grev, gor, and generating immediates
are just about the only practical uses we'll find and that it's overly
complex for no good reason.

for generating immediates imho it'd be better to have a bitwise-repeat
instruction (like nmigen's Rep), since that'd be useful for things other
than immediates too and much easier to understand and can generate more
nice patterns than grevlut hapoens to.
e.g.
brep[i] rt, bits-to-repeat, repetition-length
brepi rt, 0xAC, 8 # repeats 0b10101100
rt = 0xACACACACACACACAC
ra = 0x2
brep rt, ra, 2 # repeats 0b10 from lower 2 bits of ra
rt = 0xAAAAAAAAAAAAAAAA
ra = 0x1864
brep rt, ra, 4 # repeats lower 4 bits of ra
rt = 0x4444444444444444
ra = 0x12345678
brep rt, ra, 16
rt = 0x5678567856785678

I'll respond to the rest of the email separately, later.

Jacob