[Libre-soc-bugs] [Bug 230] Video opcode development and discussion

Sun Dec 20 12:49:15 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=230

--- Comment #61 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to cand from comment #60)
> The most useful interpretation would be for RA to be the giant vector, and
> RB to be a single vec4 (or whatever size is used).

this might be the case already (without needing to add any extra instructions)
here's what the regfiles, conceptually, look like:

    typedef union {
        uint8_t  b[8];
        uint16_t s[4];
        uint32_t i[2];
        uint64_t l[1];
    } reg_t;

    // integer table: assume maximum SV 7-bit regfile size
    reg_t int_regfile[128];

let us say that RA is set to elwidth=32 and RB is set to elwidth=8.
then this will be the VL loop:

    for i in range(VL):
         int_regfile[RA].i[i] = some_op(int_regfile[RB].b[i])

where some_op may involve clamping/saturation.  you see here that
elwidth=8 refers to the "b" (byte) array of the union, and that elwidth=32
refers to the "i" (word) array of the union?

the result would be:

RA+0 32 bit:   RB+0 byte     0x00     0x00      0x00
RA+1 32 bit:   RB+1 byte     0x00     0x00      0x00
RA+2 32 bit:   RB+2 byte     0x00     0x00      0x00

where vec4 is involved it's a little more like this (RA elwidth=32,
RB elwidth=8):

    for i in range(VL):
         if predicate_bit_not_set(i) continue
         uint8_t *start_point = (uint8_t*)(int_regfile[RA].i[i])
         for j in range(SUBVL): # vec4
              start_point[j] = some_op(int_regfile[RB].b[i*SUBVL + j])

this would produce:

RA+0 32 bit:   RB+0 byte     RB+1 byte     RB+2 byte   RB+3 byte
RA+1 32 bit:   RB+4 byte     RB+5 byte     RB+6 byte   RB+4 byte

where each RB+N byte would have been run through "some_op()".

if that were a byte-array-vec3 into a word-array, you end up with the 4th byte
zero'd out in the 32-bit RA destination:

RA+0 32 bit:   RB+0 byte     RB+1 byte     RB+2 byte   0x00
RA+1 32 bit:   RB+3 byte     RB+4 byte     RB+5 byte   0x00
RA+2 32 bit:   RB+6 byte     RB+7 byte     RB+8 byte   0x00

is that clear at all?  if not i can write it out or find... i'm sure
i've done this before... ah! here:

   https://libre-soc.org/simple_v_extension/appendix/#load_example

although that's a "LOAD" it's the same principle (because we are
referring to MVs, which are also twin-predicated and have both
a src elwidth over-ride and a dest elwidth override).

-- 
You are receiving this mail because:
You are on the CC list for the bug.