<div dir="auto"><div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 6, 2023, 10:45 Luke Kenneth Casson Leighton <<a href="mailto:lkcl@lkcl.net">lkcl@lkcl.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br><br>On Monday, March 6, 2023, Jacob Lifshay <<a href="mailto:programmerjake@gmail.com" target="_blank" rel="noreferrer">programmerjake@gmail.com</a>> wrote:<br>> On Mon, Mar 6, 2023, 04:10 Luke Kenneth Casson Leighton <<a href="mailto:lkcl@lkcl.net" target="_blank" rel="noreferrer">lkcl@lkcl.net</a>> wrote:<br>>><br>>> folks we need to discuss what RFCs should go in next, and plan<br>>> groupings<br>>> <a href="https://libre-soc.org/openpower/sv/bitmanip/" target="_blank" rel="noreferrer">https://libre-soc.org/openpower/sv/bitmanip/</a><br>>><br>>> my recommendation is to not go above about 5-7 instructions<br>>> per RFC, and to group them.  candidates:<br>>><br>>> * ternlogi, crternlogi, binlut, crbinlut<br>><br>> these are a good choice to submit next with mostly obvious benefit, though we might be able to squeeze an extra bit out of ternlogi's immediate by deleting the redundant encodings already covered by li, and, or, xor, mv, etc. it seems worth trying and seeing how complex that would be. we can also just decide redundancy is ok and simplicity is worth the extra encoding bit. maybe that should be an unresolved question that the ISA WG can answer.<br><br>no.  the Power ISA decoder is ridiculously complex as it is.<br>POWER9 has a 2 stage decoder which is ridiculous<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">the extra logic would go in the ternlog ALU, not the decoder.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>>><br>>> * average-sum-diff and abs-accumulate, useful for AV<br>><br>> pretty good, but imho ternlog is more compelling since av insns already exist in vsx <br><br>except we're not doing VSX. the case for adding them as scalar<br>is based on SVP64, these being a stepping stone.<br><br>> and, without vectorization of some sort, are not very beneficial<br><br>hence why SVP64 was put in as the very frst RFC.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">yeah, ok, i forgot that described SVP64 and wasn't only reserving opcode space.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>>><br>>> * grevlut, xperm, bitmatrix<br>><br>> imho grevlut still has the major problem of using a huge amount of encoding space for not much benefit, i think it can be greatly simplified while retaining nearly all the practical benefit, <br><br>you tried once already and dramatically reduced the capability<br>(to a fraction of grevlut) which tells me that rather than the<br>instruction being "not much benefit" you don't quite understand<br>how powerful it is.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">well actually the only justification i remember you giving for it not being based on and-or gates is that muxes give more possible immediates of the style you checked for, but icr if the and-or one uses less immediate bits...</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>that said because it is so innovative and new there simply<br>hasn't been any analysis done, no use-cases except that gorc<br>grev etc can be covered by it (like ternlog covers crand etc),<br>this will itself make it difficult to justify inclusion,<br>until that research is done.  good thing there's an NLnet Grant<br>milestone for exavtly that, eh? :)<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">well, imho that's just as good a reason to do the same research on the and-or variant. my intuition tells me grev, gor, and generating immediates are just about the only practical uses we'll find and that it's overly complex for no good reason.</div><div dir="auto"><br></div><div dir="auto">for generating immediates imho it'd be better to have a bitwise-repeat instruction (like nmigen's Rep), since that'd be useful for things other than immediates too and much easier to understand and can generate more nice patterns than grevlut hapoens to.</div><div dir="auto">e.g.</div><div dir="auto">brep[i] rt, bits-to-repeat, repetition-length</div><div dir="auto">brepi rt, 0xAC, 8 # repeats 0b10101100</div><div dir="auto">rt = 0xACACACACACACACAC</div><div dir="auto">ra = 0x2</div><div dir="auto"><div dir="auto">brep rt, ra, 2 # repeats 0b10 from lower 2 bits of ra</div><div dir="auto">rt = 0xAAAAAAAAAAAAAAAA</div><div dir="auto">ra = 0x1864</div><div dir="auto"><div style="min-width:150px" class="elided-text">brep rt, ra, 4 # repeats lower 4 bits of ra</div></div></div><div dir="auto">rt = 0x4444444444444444</div><div dir="auto">ra = 0x12345678</div><div dir="auto">brep rt, ra, 16</div><div dir="auto">rt = 0x5678567856785678</div><div dir="auto"><br></div><div dir="auto">I'll respond to the rest of the email separately, later.</div><div dir="auto"><br></div><div dir="auto">Jacob</div></div>