[libre-riscv-dev] system call (sc) LEV "reserved field"

Thu Jul 23 05:16:38 BST 2020

hi benjamin, welcome, and thank you for your insights.

On Thursday, July 23, 2020, Benjamin Herrenschmidt <benh at kernel.crashing.org>
wrote:

> On Thu, 2020-07-23 at 00:41 +0100, Luke Kenneth Casson Leighton wrote:
>
> > so, when some bit is added in the future, an older processor (and the
> > device it is in) basically has to be thrown into landfill.
>
> Depends, it goes both ways. For example, this is what allowed the
> addition of lwsync without throwing older processors into the landfill
> because it would automatically "escalate" to a full sync on processors
> that didn't implement it

ah.   on first glance this seems ok.

> (or should have... FSL did screw that up on
> some cores).

oh whoops :)

what i have observed (again this is from RISCV), for instructions that can
be "safely ignored" are of a "hint" type.

the RISCV Fence instructions also fell into this category.  being
optimisations, less complex implementations could completely ignore them.

however in the example that you give - lwsync - i take it that this is a
"less costly" cache/sync flush type instruction than the one that it was
added to?

i.e. old sync is a "superset" of lwsync?

this is very interesting to me, because i have made a longterm study of
"how to develop stable specifications", and it is the first time in 25
years that i have encountered a counterexample to the practice of
disallowing "reserved behaviour".

the "normal" way in "reserved is illegal" specifications, would be that
lwsync would raise an illegal instruction on a processor that did not
support it, the author of the trap would know that the extra bit did not
matter, the trap would call the alternative sync instruction, and return
from the trap.

i do not know the full story about what FSL is: without knowing the details
i suspect that a trap *could* have sorted things out, there, *if* the older
processor was required to raise "illegal instruction"

but because the bit is ignored, there is no way that can be done.

so whilst on the face of it, lwsync *sounds* like a counterexample, instead
(caveat, i do not know what FSL is), it seems more to support the case
*for* use of reserved bits raising illegal instructions [on UNIX platforms]

where of course this would become problematic is if lwsync (or other future
added instruction) was part of a tight, performance-critical loop.  a trap
and emulate in the middle of that would severely degrade older processor
performance.  there is nothing that can be done about that.

> if however reserved bits being set cause an exception, the "old"
> > processor stands a chance of emulating the new behaviour (in
> > software, even if that's slow), giving it a chance of keeping out of
> > landfill for slightly longer.
>
> Which is why powerpc tends not to "add bits" to instructions unless
> ignoring them is a safe fallback.

as we can see above, the example that was given  shows that this is a
problematic approach.

it *sounds* safe to "downgrade" an instruction that does more work (more
syncing) to one that does less, however this is a pretty unique case.

i would say that PowerISA "should have had" space for such downgrading
already planned in advance, except, ruefully and regretfully, such is not
possible or practical (changing the past) and pragmatically we have to work
with how things are, now.

use of "hints" would not help here either unless they were a macro-op fused
type of hint.

wait... that would work.

a sequence as follows:

* hint instruction "next instruction is to be lwsync"
* old sync instruction

older processors would:

* ignore the hint
* execute the old sync instruction as-is, safely doing more work than
necessary.

newer processors would macro-op fuse the two into an lwsync.

of course now with prefixes in POWER10 / v3.1, prefixes could do the same
job without running out of hint space very rapidly.

and Opcode 1 is of course "illegal instruction", and could be used
extensively for "instruction upgrading"

this would work really well, leaving hint macro-op fusion for those very
special cases similar to lwsync where "behaviour downgrading" is needed.

hint macro-op fusion would also work really well in performance critical
loops, too.

> > bottom line if it is correct that on the PowerISA UNIX Platform
> > reserved bits can be ignored that is cause for some concern, where
> > for Embedded it would be the other way round: cause for concern if
> > the reserved bits could *not* be ignored.
>
> Do you have more specific concerns here ?

sorry, Benjamin, i apologise for saying this: the question appears to be
implying, very subtly, that you've not taken on board what i've said. if
you had summarised what i wrote, in your own words, i would have a clear
indicator that you'd heard me. if that is a misinterpretation i apologise.

the specific concern *is* that reserved bits do not cause illegal
instructions to be raised on the UNIX Platform, because without that in
place, UNIX platform longterm seamless upgradeability and interoperability
between vendors across all their product ranges over time is simply not
possible.

i have not spelled it out explicitly, seeking instead to use logical
reasoning rather than make categorical statements, but this is actually a
really serious problem.

IE. Actual examples where
> this has been cause of breakage in the past ?

with PowerISA having only a few implementors, even though i am new to
PowerISA i do not expect there to be many such examples, simply because as
i said, for he "Embedded" class processors (Motorola's 32 bit range, and
even the Quorl 64 bit range can be said to be targetted at the "Embedded"
market of Networking and Routing), these will not encounter this as a
problem (because of the customised firmware that they run).

it is only the *UNIX* Platform where this matters.

that leaves, realistically, only the IBM POWER range.  and when there is
only one vendor, interoperability with other vendors is not important when
there aren't any other vendors!

however if PowerISA expands (as we know it would like to), and this is not
changed, i *guarantee* that when more implementors are involved there will
be problems, here.

in other words the approach of ignoring reserved bits has only worked (for
the past 20 years?) because IBM, realistically, is the only (non-Embedded)
implementor of UNIX Platform processors.

> LE/BE amazingly seems to work on LibreSOC, it was quite funny having
> > the trap jump into 0x700 when testing against qemu (running
> > singlestep under gdb), only to find that qemu traps change the LE bit
> > and of course in qemu once that's changed gdb can't read registers
> > correctly. sigh.
>
> It can if you manually change endian in gdb no ?

this (for a geek) is why it is amusing, because we did exactly that.

during setup, i call the machine-interface "set endian" command depending
on whether the program being executed is LE or BE.

breakpoints are set on 0x700 and end of program.

all good so far

then the trap occurs, and the MSR changes at the trap breakpoint.

when we read the PC it is 0x0070000000000000 not 0x700

when we read the MSR it is *also* byte reversed.

so there is no way to read the MSR and determine from the bits returned if
the bits need to be reversed because the bits that tell you you should
reverse the bits are already reversed.

doh :)

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68