[Libre-soc-isa] [Bug 1056] questions and feedback (v2) on OPF RFC ls010

Thu May 25 18:08:26 BST 2023

https://bugs.libre-soc.org/show_bug.cgi?id=1056

--- Comment #15 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Paul Mackerras from comment #9)
> "Register files, elements, and Element-width Overrides" section:
> 
> I strongly disagree that the register file should be accessed in
> little-endian byte order when the processor is in big-endian mode. Requiring
> that will make Simple-V practically unusable in big-endian mode (just as
> saying that the register file has to be big-endian always would make
> Simple-V unusable in little-endian mode).

(i am assuming you are referring to register-to-register operations
 which in and of itself is a massive "ask")

this in effect is equivalent to asking for the PC to run backwards,
because of the sub-loop running 0..VL-1 running backwards (VL-1 downto 0)

but here's the thing: if you have a src and a destination reg and
you go in reverse, on both, it makes no difference in most circumstances!

    for i in 0..VL-1
         GPR(RT+i) <- GPR(RA+i) + EXTS64(SI)

is no different from:

    for i in VL-1 DOWNTO 0
         GPR(RT+i) <- GPR(RA+i) + EXTS64(SI)

as long as there is no overlap on RT..RT+VL-1 with RA..RA+VL-1
then back-end Hardware will actually not care *at all*.

what you are MUCH more likely to be expecting is:

    for i in 0..VL-1
         reversed_i = VL-1-i
         GPR(RT+i) <- GPR(RA+reversed_i) + EXTS64(SI)

and the first most crucial thing is, how the hell is that even possible
to express when there are no spare bits in the Prefix?

we went through this exercise over 2 years ago, it was so complex i
had to put my foot down and say NO.  instead going with REMAP to
perform these types of "optional inversions".

if you really need to load in backwards order, you can always
use LD/Immediate with a negative immediate.

    sv.ld/els *RT, -8(*RA)

or use REMAP to load in element-reversed order but access memory
in positively-incrementing order, by applying REMAP with reverse
to *RT, but not to *RA.

but this is down to the Programmer.

a reversed order messes with Data-Dependent Fail-First.  clearly
these are not the same:

    for i in VL-1 DOWNTO 0      
         GPR(RT+i) <- GPR(RA+i) + EXTS64(SI)
         CR.field[i] = CCode_test(GPR(RT+i))
         if DDFFirst:
             if FAILED(CR.field[i]):
                 VL=i
                 break

vs:

    for i in 0..VL-1
         GPR(RT+i) <- GPR(RA+i) + EXTS64(SI)
         CR.field[i] = CCode_test(GPR(RT+i))
         if DDFFirst:
             if FAILED(CR.field[i]):
                 VL=i
                 break

i don't even want to go there in working through that, rewriting
everything, implementing it, making sure i am happy with it.
just... no.  it's far too much, far too late.

> 
> because that means that the relationship between array indices in
> memory and element numbers in the register file is the identity mapping 

regardless of MSR.LE or MSR.BE,  it already is, Paul.  by definition.
see the Canonical definition in c.
which (reminder) is numbered in LSB0 order (because it's c)

   array index 0 === element index 0 === SVSTATE.srcstep=0
   (this only works in LSB0 numbering)

now, what i *don't* have a problem with is *someone else* doing
an independent Research Project into reverse-element ordering.
one tip i would advise them to consider is, experiment with
(new) MSR bits (or other state), don't overload MSR.LE as it
is specifically associated with Memory-to-Register.

(not even VSX has Register-to-Register element-inversion dependent
 on MSR.LE, Brad made that clear to me a few months back)

> Elements are not unbounded arrays - there are only a finite number of them
> that exist. You don't specify what happens if you run off the end of the
> register file. The architecture needs to specify that.

raises illegal trap for emulation (greater than 128 GPRs or for
Embedded which will only have 32).  will make a note in comment #0

i am sure it is written somewhere. probably the appendix.

> The third dot point is not clearly expressed. I think it means that
> element-width overrides cause the register file to be considered as an
> linear array of chunks of that width

yes. think in terms of a byte-addressable SRAM, 64-bit-wide, where
elwidths cause elements to wrap sequentially and contiguously.

(i am getting real fed up of rewording this btw.)

> (but the register number specified in
> the instruction is still interpreted in 64-bit units, right?).

yes.

    uint64_t GPR[128]; "the 64-bit units"
    uint8_t  ew8[]  = (uint8_t*)&(GPR[RT]); linear array of 8bit chunks
    uint16_t ew16[] = (uint16_t*)&(GPR[RT]); linear 16...
    uint32_t ew32[] = (uint33_t*)&(GPR[RT]);
    uint64_t ew64[] = (uint64_t*)&(GPR[RT]); ... default 64 bit

> 2nd & 3rd paragraphs: in ANSI C, are you sure that indexing beyond the
> bounds of a union is defined behaviour? 

last time i tried it, it worked perfectly. now, linux kernel
had all unbounded arrays *removed* because llvm was bitching,
so it *may* only be supported by gcc.

suggestions on appropriate syntax appreciated.

-- 
You are receiving this mail because:
You are on the CC list for the bug.