[Libre-soc-isa] [Bug 1056] questions and feedback (v2) on OPF RFC ls010
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Tue Jun 6 15:15:48 BST 2023
https://bugs.libre-soc.org/show_bug.cgi?id=1056
--- Comment #64 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Paul Mackerras from comment #63)
> (In reply to Luke Kenneth Casson Leighton from comment #54)
> >
> > jacob and i went to a LOT of trouble to ensure that SV is an
> > orthogonal consistent RISC paradigm.
>
> Just as a side note, orthogonality does have an engineering cost,
> particularly in terms of verification. Sometimes it is pragmatically
> necessary to limit orthogonality in order to keep the verification state
> space manageable. In this case, that might mean having a defined set of
> scalar instructions which can be vectorized, rather than saying that almost
> any scalar instruction can be vectorized. I know that seems sub-optimal
> conceptually, but it may be necessary for practical reasons, particularly
> for an initial implementation. The set of vectorizable instructions can
> always be expanded later.
responding reverse-order, got to this point, needs a re-read a couple
of times more.
summary is: i agree with you but it cannot be a free-for-all
(hence the Compliancy Levels, which need review)
some design context first:
the bare minimum implementation is fetch-decode-{LOOP}-issue-execute.
the LOOP on register numbers goes directly into the exact same
Register-Hazard Management as if the looping did not exist.
in these naive implementations even elwidth overrides would be
single-issue
thus a simple naive first-implementation may extend by just one
pipeline stage, use byte-writeable regfiles, and call it a day.
the next advancement (ignoring REMAP entirely) is to do (sequential)
batching just like Multi-Issue. in fact exactly like Multi-Issue.
the sequential nature of the looping allows for extremely easy Hazard
Management as long as you convert binary reg#s into unary-encoding:
rt=3, ra=8, VL=3
=> rt=0b000111000, ra=0b00011100000000
then detecting Hazard overlaps involves simple AND gates not
massive multi-ported CAMs.
elwidth overrides also end up with Hazard Management down at byte-level
but even here unary-encoding comes to the rescue.
REMAP the next complication simply sits between decode and Hazard
Management, shuffling the offsets *before* dropping it into Hazard
Read/Write tables. [this helps explain why i say that it has to
be Deterministic as this is a critical gate-latency juncture, right
smack in Decode/Issue: if you look up Indexed REMAP you will see that
modifying the GPRs after the svindex instruction is UNDEFINED]
before Multi-Issue Hazard Management tables get so insanely large
(several million gates) that clock speeds above 500 mhz are unattainable
no matter the geometry there are two things to the rescue:
Write-after-Write (aka "Register Renaming") combined with
SVTATE.hphint.
hphint allows *intra-* batch Hazards to be utterly disregarded
*within the batch only* not *inter-* batch, and the renamed
batch gets thrown in a nice sequential order at the available
Function Units.
so that is the gamut / gauntlet of all possible (sane) implementations
based on industry-standard pre-existing Micro-Architectures.
now with that context in mind we may evaluate the proposal.
* the first insight that occurred to me *might* be that it is from
the perspective of a standard SIMD or standard Cray-Vector ISA.
can i check whether or not you are thinking in terms of passing
the entire Vector operation *including VL* down into the pipelines?
this is a perfectly legitimate implementation, to use e.g. a FSM
(like Microwatt's FP unit) with an additional for-loop *actually in*
the FSM itself, and to set up a communications protocol with the
regfile that not only contains the Reg# RT RA BA BB FRS etc but
*also the offset index*. thus when reading/writing to the
regfile the Function Unit *itself* sends multiple (sequential)
read/write requests in succession. even potentially implements its
own miniature Vector Chaining.
https://en.m.wikipedia.org/wiki/Chaining_(vector_processing)
but the key is that Hazard Management *still had to be done* even
before issuing {Instruction}+{0..VL} down into the Function Unit
(or {Instruction}+{0..3} {Instruction}+{4..7} {Instruction}+{8..VL}
to multiple Function Units)
* thus logically the most complex part (not in naive implementations)
is the Hazard Management and that has to be done anyway
* therefore in order to comply with the spec you *had* to do the hard
bit (Dependency Matrices) and once done *every* Function Unit
can use that.
* if a given instruction for any reason is too complex to parallelise
with the combined context of Multi-Issue *and* Looping then there is
no problem at all, just fall back to "naive" (single-issue) looping.
if *really* a problem then the absolute bare-minimum fallback
is that of single-step (like in debug mode): only allow one live
instruction at a time.
* good examples where single-issue fallback would be strongly
advised would be tdi and twi (yes they get Vectorized! they have
RA and RB as sources!)
* in this light to *stop* specific instructions from being Vectorized
it actually requires more complex Decoding! ok, some
implementations may fire an Illegal Instruction Trap.
* and this brings us neatly onto the SV Compliancy Levels, in effect,
because there will be certain mimimum levels of implementation
expected performance within the anticipated categories
(A/V DSP, GPU/HPC) given that trap-and-emulate will suck pretty
badly on SVP64, end-users are highly likely to complain.
* bottom line, even if it is logical and sane from a hardware
implementation perspective to not Vectorize some instructions
it cannot become a free-for-all just as SFS and SFFS and all
non-Vectorized Compliancy Levels cannot be a free-for-all,
they exist for a reason and the exact same logic applies to
Vectorized space.
* and bear in mind just like in the Vulkan Spec managed by the Khronos
Group speed is *not* made mandatory, that is to implementors to decide,
and compete on. the spec mandatoriness is on *what* is implemented
so that software developers do not go insane.
* thus the discussion becomes about the SV Compliancy Levels so that
software (HWCAPS_SVP64_xxxxx) does not end up in total meltdown.
compliancy levels: happy to have constructive input on them
https://libre-soc.org/openpower/sv/compliancy_levels/
regarding Verification: we (RED Semiconductor Ltd) HAVE to have
Compliancy Suites, and they will be FOSS-Licensed (Libre-SOC).
the Test API allows plugging in alternative implementations
including autogenerating standalone Makefiles for static build and
test
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Libre-SOC-ISA
mailing list