[Libre-soc-isa] [Bug 1056] questions and feedback (v2) on OPF RFC ls010

Tue May 30 01:17:28 BST 2023

https://bugs.libre-soc.org/show_bug.cgi?id=1056

--- Comment #23 from Paul Mackerras <paulus at ozlabs.org> ---
(In reply to Luke Kenneth Casson Leighton from comment #15)
> (In reply to Paul Mackerras from comment #9)
> > "Register files, elements, and Element-width Overrides" section:
> > 
> > I strongly disagree that the register file should be accessed in
> > little-endian byte order when the processor is in big-endian mode. Requiring
> > that will make Simple-V practically unusable in big-endian mode (just as
> > saying that the register file has to be big-endian always would make
> > Simple-V unusable in little-endian mode).
> 
> (i am assuming you are referring to register-to-register operations
>  which in and of itself is a massive "ask")
> 
> this in effect is equivalent to asking for the PC to run backwards,
> because of the sub-loop running 0..VL-1 running backwards (VL-1 downto 0)

This is not at all what I meant. The sub-loop would still run 0..VL-1.

> but here's the thing: if you have a src and a destination reg and
> you go in reverse, on both, it makes no difference in most circumstances!
> 
>     for i in 0..VL-1
>          GPR(RT+i) <- GPR(RA+i) + EXTS64(SI)
> 
> is no different from:
> 
>     for i in VL-1 DOWNTO 0
>          GPR(RT+i) <- GPR(RA+i) + EXTS64(SI)
> 
> as long as there is no overlap on RT..RT+VL-1 with RA..RA+VL-1
> then back-end Hardware will actually not care *at all*.
> 
> what you are MUCH more likely to be expecting is:
> 
>     for i in 0..VL-1
>          reversed_i = VL-1-i
>          GPR(RT+i) <- GPR(RA+reversed_i) + EXTS64(SI)

No, that's not what I meant. If the element width is 64 bits then there would
be no difference at all between BE and LE.

> and the first most crucial thing is, how the hell is that even possible
> to express when there are no spare bits in the Prefix?

It doesn't need any bits in the prefix.

> we went through this exercise over 2 years ago, it was so complex i
> had to put my foot down and say NO.  instead going with REMAP to
> perform these types of "optional inversions".

I would argue it's not an inversion and it's not optional.

> if you really need to load in backwards order, you can always
> use LD/Immediate with a negative immediate.
> 
>     sv.ld/els *RT, -8(*RA)
>
> or use REMAP to load in element-reversed order but access memory
> in positively-incrementing order, by applying REMAP with reverse
> to *RT, but not to *RA.
> 
> but this is down to the Programmer.
> 
> a reversed order messes with Data-Dependent Fail-First.  clearly
> these are not the same:
> 
> 
>     for i in VL-1 DOWNTO 0      
>          GPR(RT+i) <- GPR(RA+i) + EXTS64(SI)
>          CR.field[i] = CCode_test(GPR(RT+i))
>          if DDFFirst:
>              if FAILED(CR.field[i]):
>                  VL=i
>                  break
> 
> vs:
> 
>     for i in 0..VL-1
>          GPR(RT+i) <- GPR(RA+i) + EXTS64(SI)
>          CR.field[i] = CCode_test(GPR(RT+i))
>          if DDFFirst:
>              if FAILED(CR.field[i]):
>                  VL=i
>                  break
> 
> i don't even want to go there in working through that, rewriting
> everything, implementing it, making sure i am happy with it.
> just... no.  it's far too much, far too late.

None of that is what I was suggesting. What you have written above seems to
imply that you regard BE as "backwards". My suggestion was not about doing
anything backwards or in reverse order. All your loops still go in the same
order.

The only difference is the numbering of elements within a register; element 0
in a register would be the left-most element rather than the right-most.

> > because that means that the relationship between array indices in
> > memory and element numbers in the register file is the identity mapping 
> 
> regardless of MSR.LE or MSR.BE,  it already is, Paul.  by definition.
> see the Canonical definition in c.

If I have an array of (say) four 16-bit quantities in memory, and I load that
into a register using an ld instruction, then in BE mode, array index 0 ends up
in the most significant 16 bits of the register, and array index 3 ends up in
the least significant 16 bits. You are insisting that the least significant 16
bits are element 0 from the point of view of the SV iterations; so now if I use
VL=2 I end up working on array indices 3 and 2 rather than 0 and 1.

> which (reminder) is numbered in LSB0 order (because it's c)

No, C does not inherently assume LSB0 order.

>    array index 0 === element index 0 === SVSTATE.srcstep=0
>    (this only works in LSB0 numbering)
> 
> 
> now, what i *don't* have a problem with is *someone else* doing
> an independent Research Project into reverse-element ordering.
> one tip i would advise them to consider is, experiment with
> (new) MSR bits (or other state), don't overload MSR.LE as it
> is specifically associated with Memory-to-Register.
> 
> (not even VSX has Register-to-Register element-inversion dependent
>  on MSR.LE, Brad made that clear to me a few months back)

No it doesn't; and I am not suggesting element *inversion* either.

At this point, I will concede my failure to make my suggestion sufficiently
clear, and drop it as being not worth the effort to persist with. I still think
it is the correct approach, though.

[snip]

> > 2nd & 3rd paragraphs: in ANSI C, are you sure that indexing beyond the
> > bounds of a union is defined behaviour? 
> 
> last time i tried it, it worked perfectly. now, linux kernel
> had all unbounded arrays *removed* because llvm was bitching,
> so it *may* only be supported by gcc.

Doing something that is not defined behaviour can and often does work
"perfectly" on some (or even most) implementations. I was asking a question
about what the C standard says, not what gcc or llvm or any other
implementation actually does.

-- 
You are receiving this mail because:
You are on the CC list for the bug.