# [Libre-soc-bugs] [Bug 230] Video opcode development and discussion

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Sun Dec 20 12:49:15 GMT 2020

```https://bugs.libre-soc.org/show_bug.cgi?id=230

--- Comment #61 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to cand from comment #60)
> The most useful interpretation would be for RA to be the giant vector, and
> RB to be a single vec4 (or whatever size is used).

this might be the case already (without needing to add any extra instructions)
here's what the regfiles, conceptually, look like:

typedef union {
uint8_t  b[8];
uint16_t s[4];
uint32_t i[2];
uint64_t l[1];
} reg_t;

// integer table: assume maximum SV 7-bit regfile size
reg_t int_regfile[128];

let us say that RA is set to elwidth=32 and RB is set to elwidth=8.
then this will be the VL loop:

for i in range(VL):
int_regfile[RA].i[i] = some_op(int_regfile[RB].b[i])

where some_op may involve clamping/saturation.  you see here that
elwidth=8 refers to the "b" (byte) array of the union, and that elwidth=32
refers to the "i" (word) array of the union?

the result would be:

RA+0 32 bit:   RB+0 byte     0x00     0x00      0x00
RA+1 32 bit:   RB+1 byte     0x00     0x00      0x00
RA+2 32 bit:   RB+2 byte     0x00     0x00      0x00

where vec4 is involved it's a little more like this (RA elwidth=32,
RB elwidth=8):

for i in range(VL):
if predicate_bit_not_set(i) continue
uint8_t *start_point = (uint8_t*)(int_regfile[RA].i[i])
for j in range(SUBVL): # vec4
start_point[j] = some_op(int_regfile[RB].b[i*SUBVL + j])

this would produce:

RA+0 32 bit:   RB+0 byte     RB+1 byte     RB+2 byte   RB+3 byte
RA+1 32 bit:   RB+4 byte     RB+5 byte     RB+6 byte   RB+4 byte

where each RB+N byte would have been run through "some_op()".

if that were a byte-array-vec3 into a word-array, you end up with the 4th byte
zero'd out in the 32-bit RA destination:

RA+0 32 bit:   RB+0 byte     RB+1 byte     RB+2 byte   0x00
RA+1 32 bit:   RB+3 byte     RB+4 byte     RB+5 byte   0x00
RA+2 32 bit:   RB+6 byte     RB+7 byte     RB+8 byte   0x00

is that clear at all?  if not i can write it out or find... i'm sure
i've done this before... ah! here: