[Libre-soc-isa] [Bug 560] big-endian little-endian SV regfile layout idea

Thu Dec 31 21:44:22 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=560

--- Comment #35 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Alexandre Oliva from comment #24)
> > now the underlying order *does* matter.
> 
> exactly!  that's why we're debating the iteration order.
> 
> see, you're talking about an array of uint8_t, so let's take it from there.
> 
> uint8_t foo[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
> 
> ; r3 points to foo
>   ld r4, 0(r3)
>   ld r5, 8(r3)

load double.  let us assume LE mode.  this should, i believe (but may get this
wrong) set r4 equal to 0x07060504030201 and r5 to 0x000f0e0d0c0b0a0908

this basically places values as follows:

int_regs[4].actual_bytes[0] = 0x01
int_regs[4].actual_bytes[1] = 0x02
etc.

where the actual_bytes is in LE order and all of the union c-array members are
in LE order.

>   setvli r0, 16
>   svp64 elwidth_src=8-bit mv r6.s, r4.v  

let me work through this.

* elwidthsrc=8 but dest has not been specified, therefore it defaults to 64
bit.
* r6 is the dest, set as a scalar, and it is a 64 bit scalar.

the union of types for reg_t has already had the underlying SRAM store the data
in int_regs[4].actual_bytes as 0x0807060504030201

when the for-loop of the mv reads the src, the operation is:

   result = get_polymorphed_reg(4, 8, 0)
   set_polymorphed_reg(6, 64, 0, result)

the fetch from the regfile of r4, element 0 @ bitwidth 8 goes to this line:

    if bitwidth == 8:
        return int_regfile[4].b[0]

therefore the value 0x01 is returned because by accessing b[0] this is getting
byte 0 of actual_bytes, and everything is in LE order therefore 0x01 is
returned.

that is then zeroextended to 64 bit and stored in

      int_regfile[6].d[0]

this storage, again, is LE SRAM, LE actual_bytes, LE union struct members
therefore the value 0x01 goes into byte zero of regfile 6's actual_bytes.

> should r6 be 1

correct.  as long as i got the original LD ccorrect.  which i cannot guarantee
because i get confused about what LE and BE mean.

>, or 8,

not a chance.  same caveat above though.

> or should it depend on endianness?

if we wish to make things so insanely complex that even the inventor of SV
cannot understand or cope with it... yes.

(answer: no)

> if we're to go by the array layout model in memory, it ought to be 1, which
> means that, in big-endian mode, the vector iteration order within the
> register should go from most to least significant, whereas in little-endian
> mode, it should go the opposite way.

this is so confusing to me i can't even interpret it. here is why:

1)  do you mean that the 0..VL-1 loop should go in the reverse order?
2) do you mean that the actual_bytes of each reg_t should be in reverse order?
3) do you mean that the bytes of each of the c arrays in the union should be in
the reverse order?

or do you mean any permutation of those 8 possible combinations?

all eight possibilities are perfectly valid when it comes to considering a
"meaning" for BE.

does this confusion, already on top of something that literally took 5 months
to get right, is not a good idea?

>  this would maintain the array layout
> equivalence, and it makes perfect sense when you think of how bytes are laid
> out in memory in each endianness: 

but unfortunately, due to a strange form of dyslexia, i don't know what that
is.  it takes me several minutes to hours to go through sonething that, to you,
looks "obvious".

> little endian means least significant
> first, big endian means most significant first.  iterating in that order is
> just natural and expected.

yes... but on what? the vector array, the 8 bytes of the SRAM per reg, or the
bytes in the individual elements?

> now, if we were to iterate over sub-register types always from least to most
> significant, then we're effectively reversing the order from the expected
> memory layout.  IOW, we're visiting first 8, then 7, ... then 1, then 16,
> then 15, ..., then 9.  are you sure this is what you want?

the walkthrough of the pseudocode should make it clear that when considering
the underlying SRAM as LE and literally implemented in c, it is not as you've
listed.

> BTW, should the load sequence into r4..r5 above be equivalent to:
> 
>   setvli r0, 2
>   svp64 elwidth=64-bit, elwidth_src=64-bit ld r4,0(r3)

it is not possible to set elwidth=64.  the options are 8, 16, 32 and "default"

this operation will action 2 LDs.  due to elwidths=default it is functionally
directly equivalent to two scalar LDs (with unit strided offsets)

     ld r4, 0(r3)
     ld r5, 8(r3)

i.e. exactly as in the example. 

> and should that really get the reversed byte order in the registers that, in
> big-endian mode, 

if the processor mode is BE, the LD above gets a BE LD.  due to the underlying
SRAM being LE this will indeed result in a bytereversal of the data before
being put into actual_bytes as 0x0102030405060708

otherwise, if the processor mode is LE it will be exactly as the walkthrough i
did above.

this is just how OpenPOWER works.  it's confusing as hell, took me 5 *months*
to get right, and trying to change it will require some SERIOUS justification
to explain to the OpenPOWER Foundation ISA WG.

now, *when we have time* this can be revisited and added via a MSR bit.

> you say we'd get with:
> 
>   setvli r0, 16
>   svp64 elwidth=8-bit, elwidth_src=8-bit ld r4,0(r3)
> 
> ?

this one is a fascinating degenerate case because you cannot bytereverse a
byte.  therefore regardless of LE or BE mode they both do the same thing.

the use of elwidths=8 has effectively "overridden" the "ld" to make it a "lb"
(load byte) operation.

the loop becomes:

    for i in range(16):
         res = MEM(r3+i) # one byte
         int_regs[4].b[i] = res

which will end up, in both LE and BE mode, storing the data in actual_bytes as
0x0807060504930201 in r4 and 0x000f....09 in r5.

consider each case to be like this:

    r4 = 0
    r5 = 0
    for i in range(8):
         res = MEM(r3+i)
         res << (8*i)
         r4 |= res
         res = MEM(r3+i+8)
         res << (8*i)
         r5 |= res

NOT repeat NOT

    for i in range(8):
         res = MEM(r3+15-i)
         res << (8*i)
         r4 |= res
         res = MEM(r3+7-i)
         res << (8*i)
         r5 |= res

or any other type of loop which is hard to justify and explain.

> 
> the point is, we have to make a choice here. 

already made, 18+ months ago.  captured by the SRAM of the regfile, as
specified by the c data structure, being defined categorically as LE ordered.

now, admittedly it had never occurred to me that anyone would consider
inverting the meanings of the SRAM, hence why i didn't document it (which i
will do)

however over the 2+ years of development of SV my thinking has always been in
LE order as far as that c-based union is concerned.

changing that now when we are so far behind is, again, just not a good idea.

> do we choose
> 
> a) compatibility with the memory/data endianness selected for the system,
> and set the iteration order in sub-register vector elements to match, or 

no.  because the decision was made 18-24 months ago that the SRAM for the
regfile and associated access is in LE.  all documentation has been written
with that in mind.  all code has been written with thst in mind.

it's not changing (and can be changed with a separate revision AFTER we have
completed the implementation, and have 3-4 months free to discuss it, document
it and implement it)

the endianness is "removed" by the HDL code and everything goes into data AS
QUICKLY AS POSSIBLE in LE order.

memory is read: the bytes are reversed AS SOON AS POSSIBLE to get the f*** away
from BE as quickly as possible.

the entire HDL is in LE order.

the documentation for the regfile SRAM is LE order

the regfile implementation is in LE order

all simulator source code deals with BE by providing a class that presents BE
bit accesses as... LE order.

trying to mess with this will literally set us back months.

the decision has already been made, and is not going to change if we want to
succeed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.