[Libre-soc-isa] [Bug 1056] questions and feedback (v2) on OPF RFC ls010

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Wed May 31 08:42:09 BST 2023


https://bugs.libre-soc.org/show_bug.cgi?id=1056

--- Comment #30 from Paul Mackerras <paulus at ozlabs.org> ---
(In reply to Luke Kenneth Casson Leighton from comment #28)
> (In reply to Paul Mackerras from comment #9)
> > "Register files, elements, and Element-width Overrides" section:
> > 
> > I strongly disagree that the register file should be accessed in
> > little-endian byte order when the processor is in big-endian mode.
> 
> just to check (1) register-to-register,
> (note deliberate use of "addi" in 3rd step):
> 
> * let us set MAXVL=VL=1
> * let us also use elwidth=16
> * let us set (verilator or "addi 5,0,0x1234") the contents of GPR(5) = 0x1234
>   LSB0 63..........0
>   MSB0 0..........63
>        0000... 12 34
> * perform "sv.addi/elwidth=16 5,0,0x1122"

I think you mean sv.addi/elwidth=16 5,5,0x1122 (not 5,_0_,0x1122). I'll assume
the 0 for RA is a typo caused by 3.27AM.

> * then inspect (verilator) GPR(5) and read its contents
> 
> is the answer you expect, regardless of LE/BE: 0x2356?
> or would it be 
> * 0x2211_0000_0000_1234 (or 0x1122_0000_0000_1234) *or*
> * 0x0000_0000_0000_3456 due to addi being implicitly
>   reversed-byte-order from sv.addi under BE?

I would expect 0x1122_0000_0000_1234 in BE mode, since you have operated on
element 0 and elements are 16 bits wide.

> now the same thing with *scalar* instructions:
> 
> * let us set (verilator or "addi 5,0,0x1234") the contents of GPR(5) = 0x1234
> * perform "addi 5,0,0x1122"
> * then inspect (verilator) GPR(5) and read its contents
> 
> is it *still* 0x23567 regardless of LE/BE?

It's 0x2356 regardless of LE/BE.

If you did sv.addi/elwidth=64 5,5,0x1122 then the answer would be 0x2356
regardless of BE/LE.

> checking (2) memory-to-register:
> 
> what about the same conditions (MAXVL=VL=1, a half-word load)
> with lhbrx vs lhx?
> 
> * sv.lhbrx vs lhbrx, BE: same value loaded?
> * sv.lhbrx vs lhbrx, LE: same value loaded?

What are you assuming the element size is?

I am not clear at this point on how the element size affects loads and stores.
Does an element size of 16 bits mean that a load does 1/4 of the usual number
of bits, for instance?

If the element size in your example above is 64 bits, then I would expect
sv.lhbrx and lhbrx to give the same value in the destination GPR. If the
element size is some other value, I don't know what to expect.

> if the answer in all cases (m2r&r2r) is "yes", then this is what i mean
> by "instructions must be Orthogonal regardless of Prefix/Non-prefix"

I'm not sure what "yes" would mean in the addi case above. In any case, I would
note that addi will in general give a different result from sv.addi/elwidth=16
in LE mode as well as in BE mode. For example, suppose r5 contains 0xffff
initially.

addi 5,5,1 will give 0x10000 in r5
sv.addi/elwidth=16 5,5,1 will give 0 in r5 (assuming VL=1 and LE mode).

> if the answer in all cases is "no", then resisting the pressure
> to break Orthogonality, these are some potential options:
> 
> * solution (1) is to add *scalar* instructions that perform the BRev
>   (and then SV-Prefix those)
> * solution (2) is to add *scalar* register-tagging (an SPR that
>   marks a given register as "please reverse me on GPR read *and* write"
> * solution (3) is to completely redesign the 24-bit SVP64 Prefix
>   from scratch, reserving four bits for being able to reverse
>   up to 4(!) operands (coping with FMAC)
> * solution (4) just use in-place sv.brh *RT, *RA (where RT=RA)
>   and go from there

I don't understand what problem these solutions are trying to solve. None of
them seem to me to be necessary or even desirable. You keep introducing byte
reversal, which is not ever required by my proposal.

In fact, depending on how elwidth affects loads and stores, there may be
another answer to my original concern about loading an array of values into
registers. It's possible that doing sv.ld/elwidth=16 r3,0(r4) with VL=4 will
load four 16-bit elements into r3 in the right order for future operations, but
I don't know for sure.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libre-SOC-ISA mailing list