[Libre-soc-isa] [Bug 1017] ISA WG RFC for binary and ternary bitops

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Wed Mar 15 12:28:37 GMT 2023


https://bugs.libre-soc.org/show_bug.cgi?id=1017

--- Comment #18 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #13)

> > then use binlut followed by a single crweird transfer.
> 
> that doesn't work well, i'm assuming all of the non-lut sources/dests are
> CRs, hence why we're trying to use crbinlog. you'd need 4 additional
> instructions.

have a look at Tom Forsyth's video on Larrabee as to why that may
be desirable.

[crweirds are almost certainly going to need to be micro-coded btw]

Tom explains that the team found a perfect pre-existing suite of
instructions in the original Pentium III core, for use as predicate
masks, but the distance from the gates inside that core over to the
AVX512 units was so great that it would have required slowing down the
entire ASIC by an order of magnitude in order to allow speed of light
propagation of signals to cross the chip.

they therefore were forced into the situation of adding an entire new
suite of instructions, duplicating a perfectly good set, adding
literally an entire new regfile that coud be placed closer to the
AVX512 pipelines, just to get themselves some Predicate Mask operations.

now with crbinlut being near-identical (except 4 bits at a time)
and crternlogi likewise to the CRops suite, proposing *CR* based
advanced variants of that suite is likely to be well-received
by the IBM Hardware Architect team.

a GPR-based version, especially when it wastes 60 bits out of 64,
and especially as it will cause new datapaths to be created between
the CR pipelines and the GPR Regfile, whose distance may be extremely
long in IBM's layout, will go down badly.

we are in other words taking a huge risk by loading CR Fields as
Predicate Masks, but at the same time they are perfect for the
job. it is a balance that requires some care and some modelling and
guesswork of how IBM's extremely large IC might be designed, and
to take that *existing* design into consideration.

those "extra" instructions provide a clean RISC-paradigm firebreak
between pipeline units whose distance may be simply too far apart.
forcing the CR and GPR regfile and pipelines to be close to each other
may not go down well.

[i mentioned that crweirds may need to be microoded because they are
an exception to the SVP64 rules: 64 results from 64 CRweird element
operations can go into *one single* Scalar GPR.]

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libre-SOC-ISA mailing list