<br><br>On Monday, March 6, 2023, Jacob Lifshay <<a href="mailto:programmerjake@gmail.com">programmerjake@gmail.com</a>> wrote:<br>> On Mon, Mar 6, 2023, 04:10 Luke Kenneth Casson Leighton <<a href="mailto:lkcl@lkcl.net">lkcl@lkcl.net</a>> wrote:<br>>><br>>> folks we need to discuss what RFCs should go in next, and plan<br>>> groupings<br>>> <a href="https://libre-soc.org/openpower/sv/bitmanip/">https://libre-soc.org/openpower/sv/bitmanip/</a><br>>><br>>> my recommendation is to not go above about 5-7 instructions<br>>> per RFC, and to group them.  candidates:<br>>><br>>> * ternlogi, crternlogi, binlut, crbinlut<br>><br>> these are a good choice to submit next with mostly obvious benefit, though we might be able to squeeze an extra bit out of ternlogi's immediate by deleting the redundant encodings already covered by li, and, or, xor, mv, etc. it seems worth trying and seeing how complex that would be. we can also just decide redundancy is ok and simplicity is worth the extra encoding bit. maybe that should be an unresolved question that the ISA WG can answer.<br><br>no.  the Power ISA decoder is ridiculously complex as it is.<br>POWER9 has a 2 stage decoder which is ridiculous<br><br>>><br>>> * average-sum-diff and abs-accumulate, useful for AV<br>><br>> pretty good, but imho ternlog is more compelling since av insns already exist in vsx <br><br>except we're not doing VSX. the case for adding them as scalar<br>is based on SVP64, these being a stepping stone.<br><br>> and, without vectorization of some sort, are not very beneficial<br><br>hence why SVP64 was put in as the very frst RFC.<br><br>>><br>>> * grevlut, xperm, bitmatrix<br>><br>> imho grevlut still has the major problem of using a huge amount of encoding space for not much benefit, i think it can be greatly simplified while retaining nearly all the practical benefit, <br><br>you tried once already and dramatically reduced the capability<br>(to a fraction of grevlut) which tells me that rather than the<br>instruction being "not much benefit" you don't quite understand<br>how powerful it is.<br><br>that said because it is so innovative and new there simply<br>hasn't been any analysis done, no use-cases except that gorc<br>grev etc can be covered by it (like ternlog covers crand etc),<br>this will itself make it difficult to justify inclusion,<br>until that research is done.  good thing there's an NLnet Grant<br>milestone for exavtly that, eh? :)<br><br>> therefore imho it's not ready for submission. also, it needs grev and bitrev and similar aliases<br><br>yep. as assembler-aliases.  they're all there. gorc, gxorc, grev.<br>just have to find them.  i think this is the one that generates<br>over a thousand regular-patterned constants. can't remember.<br>been too long since i wrote it.<br><br>>><br>>> * bmask (x86 BMI on steroids) and cprop (carry-propagation)<br>>> * bitmask ops (or/and/xor/get) actually shift operations<br>><br>> aren't those just `crand` or `and` and similar? i'm guessing that's not what you meant, so links please.<br><br>bmset, etc in the bitmanip page.<br>bmask is on the vector_ops page. all "vector" ops got mashed<br>out, leaving "support" routines like cprop and generalisation<br>of set-before-first etc.<br><a href="https://libre-soc.org/openpower/sv/vector_ops/">https://libre-soc.org/openpower/sv/vector_ops/</a><br><br>>><br>>> * crweird operations (powerful interchange between GPR and CR)<br>>> * carryless mul/div/mod<br>><br>> these are basically good to go, though imho are less critical so can be left for later when we need a break from more complex stuff.<br><br>ack, good assessment.<br><br>>><br>>> * int/fp mv and mv.swizzle/fmv.swizzle<br>><br>> imho int/fp mv/convert should be its own separate rfc without swizzle.<br><br>ack. yes they are different.<br><br>>  imho int/fp is basically ready, <br><br>ah i forgot, it's not ready.<br><br>> trying to smash it into fewer opcodes is imho a fool's errand <br><br>please don't denigrate rational arguments in this way.<br><br>> because it just doesn't fit, uses the same amount of encoding space, and makes it harder to understand.<br><br>irrelevant. i have gone over this already am am not repeating<br>it. reducing the number of "actual" instructions is critical<br>and the absolute top priority.  ternlogi is not submitted<br>as 256 instructions. please review tom forsyth's video on<br>larrabee.<br><br>please can you adjust the spec page to account for that<br>otherwise i am forced to do it <br><br>> also imho we might want to do swizzles after submitting at least basic svp64 subvl support. also imho swizzle might not be ready for submission, icr reviewing it in detail, so it may need some design tweaks.<br>>><br>>> transcendentals and the GF groups are a bit big to tackle at<br>>> the moment.<br>>><br>>> the most obvious priority ones (easiest to justify) would<br>>> be the AV ones. there exist already VSX variants.<br>>><br>>> thoughts?<br>><br>> in summary imho we should next submit int/fp mv/convert (no swizzle for now) or ternlog & friends.<br>> Jacob<br><br>-- <br>---<br>crowd-funded eco-conscious hardware: <a href="https://www.crowdsupply.com/eoma68" target="_blank">https://www.crowdsupply.com/eoma68</a><br><br>