[Libre-soc-isa] [Bug 1056] questions and feedback (v2) on OPF RFC ls010

Thu Jun 1 00:49:28 BST 2023

https://bugs.libre-soc.org/show_bug.cgi?id=1056

--- Comment #41 from Paul Mackerras <paulus at ozlabs.org> ---
(In reply to Luke Kenneth Casson Leighton from comment #39)
> (In reply to Paul Mackerras from comment #24)
> 
> > The ISA as it stands has a property which is extremely useful, which is that
> > with a couple of rare exceptions (see below), it is possible to analyse an
> > instruction word (or doubleword) and know which CPU registers it is going to
> > read and write, without knowing anything about the architected state of the
> > CPU.
> 
> sigh, yes - the exception to that being the contents of MSR.
> (MSR.LE, MSR.SF, and others).

No, MSR.LE and MSR.SF don't affect which registers are read or written. They
affect the value(s) written but not the identity of the registers concerned.

>  SVSTATE has to be similarly
> considered "a peer of MSR and PC" (and SVSHAPE0-3 if REMAP is
> implemented, typically in 3D GPUs, HPC, and high-end A/V DSPs)
> 
> in Libre-SOC's HDL i have a special "regfile" containing
> PC,MSR,SVSTATE,DEC,TB and when REMAP is implemented SVSHAPE0-3
> will have to join them
> 
> > This simplifies the job of anything that wants to translate or emulate
> > instructions, or generally understand what the effect of a block of code
> > could be or the dependencies between instructions. Examples include
> > valgrind, qemu, gdb, etc.
> 
> luckily the register EXTRA information is in the SVP64 24-bit Prefix.
> otherwise we _would_ be in trouble, there.

So every instruction whose behaviour is modified by vectorization has a SVP64
prefix? I haven't seen a clear and unambiguous answer as to whether that is
true or not. (You do seem to say it is true below, except that each such
statement seems to have some sort of caveat on it.)

It did seem like a "bare" addi (without SVP64 prefix) in a vertical-first loop
might be subject to register index modification, element-width overrides,
saturation, etc., from the VF loop. Does that happen, or is it the case that an
addi without SVP64 prefix is never subject to any modification (i.e. it only
ever accesses the GPRs specified by RA and RT in the instruction word)?

> (btw heads-up, the concept of "streaming" utterly borks that. ARM SVE
> has "streaming" coming.  https://arxiv.org/pdf/2002.10143.pdf)
> 
> > [The exceptions are the lswx and stswx instructions, which use a byte count
> > in the XER. The byte count controls the number of GPRs read or written. But
> > modern compilers don't use lswx or stswx, and they always cause an alignment
> > interrupt in LE mode.]
> 
> they were great when CPUs were 130 mhz and single-issue.  multi-issue it
> all goes to hell-in-a-handbasket and (following comp.arch regularly)
> general consensus is LD/ST-Multi is history.
> 
> of course total irony then that Simple-V would *accidentally* re-introduce
> it:
> 
>     sv.ld/sm=r10/els *RT, 0(RA)
> 
> 
> > If a side effect of adopting Simple-V is that this property no longer holds,
> > then that is a serious problem in my view.
> 
> > If it is the case that a 32-bit
> > instruction without any prefix could in some cases access different
> > registers from those identified in the instruction word, depending on the
> > state of the CPU (for example, depending on what is in the SVSTATE SPR),
> > then you have broken this property.
> 
> well...
>  
> > I had thought there would be a clear and simple way to tell which
> > instructions would be affected by vectorization (i.e., the presence of a
> > SVP64 prefix).
> 
> all of them!  okok - everything-that-makes-sense.  mtmsr makes no sense.
> sync makes no sense.

I was concerned with the case where there is no SVP64 prefix before an
instruction. In that case, is it correct to say that it is guaranteed to behave
exactly in all respects as specified in the current architecture, regardless of
any values in SVSTATE or any other SPR?

> > But it sounds like that is not true, unfortunately.
> 
> it is... but i had to encapsulate it in a program (i sure as hell wasn't
> going to do it by hand).

OK, that's good.

> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/
> sv_analysis.py;hb=HEAD
> 
> this program *is* reasonably-obvious, the key function being "create_key()"
> which analyses all instructions (remember i mentioned turning decode1.vhdl
> into CSV files?) and then creates a "Register Profile footprint" (aka key)
> that can be used to decide what bits in the 24-bit Prefix are to be used
> to extend registers RT RS RA RB RC BA BFA BB BT FRT ...
> 
> sv_analysis.py does actually generate markdown tables so you can see what
> it does.
> 
> https://libre-soc.org/openpower/opcode_regs_deduped/
> 
> anyway - back to the registers: a reasonable way to think of the RTL is
> that when Prefixed, all the registers have been "shifted" into a new
> namespace.
> 
> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/
> fixedarith.mdwn;hb=HEAD
>  616 * maddhd RT,RA,RB,RC
>  617 
>  618 Pseudo-code:
>  619 
>  620     prod[0:(XLEN*2)-1] <- MULS((RA), (RB))
>  621     sum[0:(XLEN*2)-1] <- prod + EXTS(RC)[0:XLEN*2]
>  622     RT <- sum[0:XLEN-1]
> 
> ===>
> 
> 
>  620     prod[0:(XLEN*2)-1] <- MULS((SVP64.RA), (SVP64.RB))
>  621     sum[0:(XLEN*2)-1] <- prod + EXTS(SVP64.RC)[0:XLEN*2]
>  622     SVP64.RT <- sum[0:XLEN-1]
> 
> which is how i can convince myself that "the instruction meaning
> did not change" - despite being Prefixed.

This is the converse concern to mine. This is about the prefixed case, I was
concerned about the non-prefixed case.

> (will answer about Saturation and Predication in a followup)

-- 
You are receiving this mail because:
You are on the CC list for the bug.