[Libre-soc-dev] scalar instructions and SVP64
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Wed Mar 10 13:37:58 GMT 2021
the simple perspective that VL is strictly a Sub-Program-Counter (actually
SVSTATE.srcstep and dststep) is one thal allows multi-issue and speculative
execution for high performance designs.
right now the identification of length and of SVP64 is extremely simple and
will be dead easy to do a high performance multi-issue fetch and issue.
SV lives between issue and execute and as such is going to meet with some
resistance due to it extending the number of pipeline stages by one.
given that in POWER9 they have a *two stage* instruction decode phase it
will meet SERIOUS resistance but they will grumble and get over it, because
they will be able to implement high performance designs.
if however we introduce a hard dependency between prefix and suffix of the
types that you keep suggesting, Jacob, because of the perspective that you
hold, then it will utterly kill SV stone dead.
because the suggestions you keep making are by definition and through that
perspective 100% creating an interlink between prefix and suffix, where the
number of decode stages *has* to be only two (or do something stupid like
Intel did with their speculative decoders), and those two stages are so
complex that they can in no way be high performance.
with the decoders in OpenPOWER being FIVE THOUSAND OR GREATER gates each if
we tried Intel's stupid trick we would be laughed at.
remember that to get the EXTRA2/3 info you literally have to do a full
even the "simple obvious sounding" idea to do different scalar opcodes from
v3.0B "because it's SVP64 prefixed" automatically and inherently create
problematic pergormance-killing interdependencies between these two decode
levels (prefix, suffix)
in v3.1 prefixes they have "extra bits" which identify sub-types. these
bits can be decoded in a 1st phase decoder that feeds into the 2nd phase of
POWER10 without compromising performance.
*we do not have space to do that* because we need the bits for modes.
therefore the decoding *has* to be done fully by a 2nd stage and by that
time it is too late to go back in time and tell issue "hey you shouldn't
have issued me with this because of conditions X Y and Z".
i trust that this explains clearly why all and any suggestions to create
interdependency between prefix and suffix have to be rejected, no matter if
we otherwise think they might be a good idea.
in practical terms it means that setvl vl=1 has to be called in order to
establish scalar-like context (or compilers ensure that regs intended as
scalar are allocated to r0-31)and this is just a price that has to be paid.
it does however occur to me that setvl could be "context propagated".
this in theory *might* save some proliferation of setvli vl=1 instructions
all over the place.
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the Libre-soc-dev