[Libre-soc-isa] [Bug 567] New: Allow transparent scalar loads and stores to/from registers allocated as vectors

Tue Jan 5 14:17:06 GMT 2021

https://bugs.libre-soc.org/show_bug.cgi?id=567

            Bug ID: 567
           Summary: Allow transparent scalar loads and stores to/from
                    registers allocated as vectors
           Product: Libre-SOC's first SoC
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: DEFERRED
          Severity: enhancement
          Priority: ---
         Component: Specification
          Assignee: cestrauss at gmail.com
          Reporter: cestrauss at gmail.com
                CC: libre-soc-isa at lists.libre-soc.org
   NLnet milestone: ---

First, note that I'm preemptively marking this as deferred, as I understand
that we have a kind of feature-freeze right now.

Also, apologies to Alexander, this is actually his proposal, in the way I see
it. Which is orthogonal to Jacob's proposal, that has to do with bitcast.

So, as I see from the NEON article, a 64-bit register can be partitioned into a
certain number of "lanes" of varying widths. For instance:

8 x  8-bit lanes: L[0] L[1] L[2] L[3] L[4] L[5] L[6] L[7]
4 x 16-bit lanes: L[0] L[1] L[2] L[3]
2 x 32-bit lanes: L[0] L[1]
1 x 64-bit lane:  L[0]

There must be a mapping, of each lane, to a range of bits within the register.

We can choose arbitrarily, as long as we are consistent. One choice is:
8-bit:
L[0] -> bits 0 to 7
L[1] -> bits 8 to 15
L[2] -> bits 16 to 23
L[3] -> bits 24 to 31
L[4] -> bits 32 to 39
L[5] -> bits 40 to 47
L[6] -> bits 48 to 55
L[7] -> bits 56 to 63
16-bit:
L[0] -> bits 0 to 15
L[1] -> bits 16 to 31
L[2] -> bits 32 to 47
L[3] -> bits 48 to 63
32-bit:
L[0] -> bits 0 to 31
L[1] -> bits 32 to 47
64-bit:
L[0] -> bits 0 to 63

Another choice is:
8-bit:
L[0] -> bits 56 to 63
L[1] -> bits 48 to 55
L[2] -> bits 40 to 47
L[3] -> bits 32 to 39
L[4] -> bits 24 to 31
L[5] -> bits 16 to 23
L[6] -> bits 8 to 15
L[7] -> bits 0 to 7
16-bit:
L[0] -> bits 48 to 63
L[1] -> bits 32 to 47
L[2] -> bits 16 to 31
L[3] -> bits 0 to 15
32-bit:
L[0] -> bits 32 to 47
L[1] -> bits 0 to 31
64-bit:
L[0] -> bits 0 to 63

Notice, that it's just bit allocations. There isn't any endianess involved, up
to now. It only affects the labeling of the "lane write enable" wires for
writing, and the "lane valid" wires for reading.

What Alex is proposing, I think, is dynamically switching between the mappings,
so that when using a 64-bit scalar load instruction, L[0]=V[0], L[1]=V[1], and
so on, irrespective of memory endianess. Correct?

I think this relabeling can be done with a single crossbar on each read port of
just the 8-bit predicate mask register, no need to shuffle the actual 64-bit
register contents. The partitioned ALUs do not care about which lane number is
assigned to each partition number, as long as the predicate mask is correct.

Also, this was only for 64-bit load and stores, I wonder how 32-bit, 16-bit and
8-bit (or even 24-bit, 40-bit, 48-bit, 56-bit) scalar load/stores would work.

-- 
You are receiving this mail because:
You are on the CC list for the bug.