[Libre-soc-bugs] [Bug 230] Video opcode development and discussion

Mon Dec 21 07:49:05 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=230

--- Comment #65 from cand at gmx.com ---
(In reply to Luke Kenneth Casson Leighton from comment #64)
> but... you're looking for a *different* elwidth on RA from RB, is that right?
> if so, that's.... yeah, very tough to fit into the SV paradigm, because
> elwidths apply in "arithmetic" cases to the whole operation, and in "MV"
> cases you have a src elwidth and a dest elwidth.

Nono, different VL, or rather mod-VL for RB. But that was only "what would be
the most useful interpretation for when RA != RB, if it's not worth it, then we
can outlaw them being different in reduce mode.

RA=RB
Just gather-add or gather-mul the elements together. Not twice.

RA!=RB
RA is gathered, RB is added/muled on top as a single vec4, not an array of
vec4s like RA. If too much trouble, then disallow RA!=RB.

RT+0 = (RA+0 + RA+4 + RA+8 ... RA+(VL-1)) + RB+0
RT+1 = (RA+1 + RA+5 + RA+9 ... RA+(VL-2)) + RB+1
RT+2 = (RA+2 + RA+6 + RA+10 ... RA+(VL-3)) + RB+2
RT+3 = (RA+3 + RA+7 + RA+11 ... RA+(VL-4)) + RB+3

> would this work?
> 
>      for i in range(VL):
>           iregs[RT+i] = 0
>           for j in range(SUBVL):
>               iregs[RT+i] += iregs[RA+i*SUBVL+j]

Yes, looks like it'd result in near the same op as horz-add.

-- 
You are receiving this mail because:
You are on the CC list for the bug.