[Libre-soc-isa] [Bug 570] New: svp64 vector loads: sub-dword selection before or after byte-reversal

Wed Jan 6 21:58:34 GMT 2021

https://bugs.libre-soc.org/show_bug.cgi?id=570

            Bug ID: 570
           Summary: svp64 vector loads: sub-dword selection before or
                    after byte-reversal
           Product: Libre-SOC's first SoC
           Version: unspecified
          Hardware: PC
                OS: Other
            Status: CONFIRMED
          Severity: enhancement
          Priority: ---
         Component: Specification
          Assignee: lkcl at lkcl.net
          Reporter: oliva at libre-soc.org
                CC: libre-soc-isa at lists.libre-soc.org
            Blocks: 213
   NLnet milestone: ---

Last night, while going over https://libre-soc.org/simple_v_extension/appendix/
with a particular focus on ld's operation with an elwidth overrider for the
src, I missed various details in the specification.

- 3.5 depicts loading full words (on ppc64, presumably dwords), and you said (I
suppose it's written somewhere) that BE loads undergo byte-reversal as quickly
as possible.  then, as we get on to 4 (not yet subsections thereof), there's
pseudocode for polymorphism that deals with accessing parts of a whole
register.  this suggested to me that the sub-indexing of the value loaded
memory would take place after byte-reversion.  that is probably not the case. 
this would mess up the order of loading sub-register vector elements in BE mode
even when using an elwidth_src that matched the vector element size (as opposed
to wider loads)

- even 4.4 doesn't specify when byte-reversal is to take place when accessing
sub-words.  normally, sub-word offseting in BE counts from the opposite end
that LE does.  If we're departing from such fundamental assumptions about
endianness even when dealing with memory, as we seem to be doing, we have to go
way out of our way to make this abundantly clear, specifying explicitly how
wide memory fetches are (4.4 seems to do that); stating explicitly when it's
the case that expected and usually-implied BE transformations are NOT to be
made (e.g. when computating the offs modulo in 4.4), and stating at which point
the byte-reversal of loaded dwords or sub-dwords is to take place

- likewise, when we use an array of sub-dword types as a model, even if you
state somewhere that the register holds data that has been byte-swapped into LE
mode, there must be explicit warnings that that model indexing does not meet
the normal expectations of CPU data endianness; specifically, even if the CPU
is in BE mode, vector element [0] is to be at the sub-dword holding the bit at
2^0, not the bit 2^{63} as would normally be the case in BE mode.  e.g.,
loading byte vectors in BE mode in wider-than-byte loads requires undoing the
byte-reversal, so that the first element lands around bit 2^0 rather than
2^{63}.  for sub-dword types wider than byte, there is no simple way to shuffle
the elements into place after a wide BE load; they *have* to be loaded
individually to fall in their place (assuming the previous point doesn't
invalidate even this way to load sub-dword BE vectors)

- the register specifying the address to be loaded from can be scalar or
vector.  it's not clear how the use of the address register and of the memory
location/s named in it relate with elwidth_src.  

-- if the load address register is a vector, is it the case that:

--- elwidth_src specifies the address width, and we take consecutive
elwidth_src-wide addresses from the address vector, and load full dwords (or
elwidth[_dest]-sized objects?) from each such (extended) address?  (this
appears to be the case for the pseudo-code given under 4.4)

--- or does elwidth_src specify the width of each load, and we take consecutive
dword-wide addresses from the address vector for each elwidth_src-wide load?

-- if the load address register is a scalar, is it the case that:

--- elwidth_src specifies the width of each load, and we take consecutive
elwidth_src-wide elements starting from the address given by the full address
register?

--- or does elwidth_src narrow the address register, and we load full dwords
starting at that narrowed and re-widened address?

Referenced Bugs:

https://bugs.libre-soc.org/show_bug.cgi?id=213
[Bug 213] SimpleV Standard writeup needed
-- 
You are receiving this mail because:
You are on the CC list for the bug.