[libre-riscv-dev] system call (sc) LEV "reserved field"

Thu Jul 23 00:41:15 BST 2020

On Wednesday, July 22, 2020, Paul Mackerras <paulus at ozlabs.org> wrote:

> On Wed, Jul 22, 2020 at 03:22:43PM +0100, Luke Kenneth Casson Leighton
> wrote:
> > hi, we're just reviewing the behaviour needed when LEV != 0, and are
> > following what microwatt does (which does not have hypervisor support
> > yet)
> >
> > https://bugs.libre-soc.org/show_bug.cgi?id=325#c106
> >
> > so the trail - i am so glad that the PDF has cross-reference linking -
> > jumps from one section to another and after jumping 5 times we
> > eventually ascertain the hypothesis that reserved fields, if set,
> > should raise an "illegal instruction".
>
> The first sentence of Book I section 1.3.3 says "Reserved fields in
> instructions are ignored by the processor."  What led you to confirm
> your hypothesis that reserved bits being set should cause an illegal
> instruction interrupt?

because i did not expect that behaviour, because doing so (ignoring them)
makes it impossible to trap and emulate.  (it becomes necessary to use JIT
analysis)

so, when some bit is added in the future, an older processor (and the
device it is in) basically has to be thrown into landfill.

if however reserved bits being set cause an exception, the "old" processor
stands a chance of emulating the new behaviour (in software, even if that's
slow), giving it a chance of keeping out of landfill for slightly longer.

however it is not appropriate for all systems to raise exceptions on
reserved bits: the cost of having the detection hardware (a full POWER9
decoder and also illegal/unsupported/reserved SPR detection) can be very
high especially for resource and power constrained silicon or FPGAs.

(example: i know someone - yea, you Sam - who implemented RV64 to comply
with the UNIX RISCV spec rather than the Embedded RISCV spec: the "CSR
detection" just to support all the zeros and illegal CSRs took a whopping
15% of an ICE40 FPGA!)

in RISC-V they get this right, by having two separate Platforms:

* Embedded which is permitted to ignore reserved bits entirely

* UNIX, which definitely is not.

for Embedded, the vendor customises the firmware entirely, and binary
interoperability as well as legacy software support is completely
unimportant.

for UNIXen, interoperability and longterm stability we know very well is
critical.

bottom line if it is correct that on the PowerISA UNIX Platform reserved
bits can be ignored that is cause for some concern, where for Embedded it
would be the other way round: cause for concern if the reserved bits could
*not* be ignored.

> > however this is so unclear (because of the referral from one section
> > to another) that i am seeking confirmation.  should we raise an
> > "illegal instruction" when "LEV > 1" on sc?
>
> Section 1.8.2 (Book I) says "any attempt to execute an invalid form of
> an instruction will either cause the system illegal instruction
> handler to be invoked or yield boundedly undefined results".  Putting
> LEV=1 in sc would be an example of an invalid form (on an
> implementation without hypervisor mode).

ok that helps clarify what that means, thank you.

>   A boundedly undefined result
> is one which could be obtained by a sequence of valid instructions,
> so in the case of sc 1, making it do what sc 0 does meets the
> boundedly undefined results requirement.

ok so that... if i am understanding correctly, means, "you can in fact do
something different and OS software has to detect it and sort it out to
yield expected behaviour"

which, if i am being honest, makes me nervous :)

> > secondly, we note that "LEV=1" is for invocation of the hypervisor.
> > what's not clear to us is - given that we are not implementing
> > hypervisor - should this be *also* treated as an illegal instruction?
> > or, should we just leave it to fall through to trap @ addr 0x0c00, and
> > expect the trap *there* to notice and deal with the situation?
>
> That is what I would do.

ok.  we can do that.

> There is one of the variants of KVM on PPC, called KVM-PR, which runs
> the guest entirely in user mode and traps and emulates all privileged
> instructions (thus it doesn't need hypervisor mode and can run inside
> a guest of another hypervisor).  If you are running a KVM guest inside
> that environment and the guest does sc 1, KVM-PR expects that to end
> up at the kernel's 0xc00 handler.  So that is one reason to treat sc 1
> as sc 0.

ahh.  i did wonder :)

>
> > also: if we set the HV bit in MSR (when LEV=1) section 6.5.14 p1077
> > which refers us back to figure 65 on p1064, will this "break" things?
>
> Probably not.  Linux does check whether HV=1 at boot time, but I'm
> pretty sure that's only on certain processors which it knows to be
> HV-capable (either by looking at PVR or the device tree).

ok.  thank you.

>
> > also: in microwatt, i'm not seeing the remaining bits which appear [to
> > need to] be set.
> >
> > https://github.com/antonblanchard/microwatt/blob/
> master/execute1.vhdl#L479
> >             ctrl_tmp.msr(MSR_SF) <= '1';
> >             ctrl_tmp.msr(MSR_EE) <= '0';
> >             ctrl_tmp.msr(MSR_PR) <= '0';
> >             ctrl_tmp.msr(MSR_IR) <= '0';
> >             ctrl_tmp.msr(MSR_DR) <= '0';
> >             ctrl_tmp.msr(MSR_RI) <= '0';
> >             ctrl_tmp.msr(MSR_LE) <= '1';
> >
> > these appear to be correct as defined according to figure 65 (p1063)
> >
> > however the remaining actions do not seem to be implemented (p1064):
> >
> >      Bits bit 5, TM, VEC, VSX, PR, FP, and PMM are set to 0.
> >      The TE field is set to 0b00.
> >      TM, FP, VEC, VSX, and bit 5 are set to 0.
>
> Right.  We have a to-do list for architecture compliance.  (We haven't
> implemented 32-bit mode or BE mode, for instance.)

yeahh although we have 32 bit op modes (using microwatt  decode1.vhdl,
turned into CSV) we have yet to support the MSR 32bit global mode.

LE/BE amazingly seems to work on LibreSOC, it was quite funny having the
trap jump into 0x700 when testing against qemu (running singlestep under
gdb), only to find that qemu traps change the LE bit and of course in qemu
once that's changed gdb can't read registers correctly. sigh.

> > question: what effect would it have - bear in mind that we are
> > following microwatt - if we implemented these changes to MSR?  bear in
> > mind that we ignore most of them at the moment (MSR.LE being one
> > notable exception), so the question is, in effect: does the Linux
> > kernel *also* ignore them?
>
> The Linux kernel clearly needs PR to be set to zero and it also
> expects FP, VEC, VSX, TM to be cleared.  Setting TE to 0 is necessary
> once you implement the trace interrupt, otherwise you could get a
> trace interrupt inside your first-level interrupt handlers, which
> would be bad.

ah :)

>  Similarly if you have floating-point and you don't set
> FE0 and FE1 to 0 on an interrupt, there is the chance of taking a
> floating-point program interrupt inside a first-level handler.

whoops.  ok appreciate the warning.

>
> I'm not sure that all this counts as the Linux kernel "ignoring" the
> bits, but in general if you do what the architecture says, the kernel
> will be happier than if you don't.

ha, that makes sense.

i generally found this out when network reverse-engineering, despite not
understanding at all what i was sending to the client or server :)

thank you Paul

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68