# [Libre-soc-bugs] [Bug 230] Video opcode development and discussion

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Mon Dec 21 13:23:54 GMT 2020

```https://bugs.libre-soc.org/show_bug.cgi?id=230

--- Comment #66 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to cand from comment #65)

>
> Nono, different VL, or rather mod-VL for RB. But that was only "what would
> be the most useful interpretation for when RA != RB, if it's not worth it,
> then we can outlaw them being different in reduce mode.

VL definitely cannot be different for RA RB or RT, it is a global for-loop.
the closest we can get away with is (and for mv this needed new instructions)
allowing one src to be vec2/3/4 and the other to be vec1.

> RA=RB
> Just gather-add or gather-mul the elements together. Not twice.

yehyeh.  res = RA[0]+RA[1]...RA[VL-1]

> RA!=RB
> RA is gathered, RB is added/muled on top as a single vec4, not an array of
> vec4s like RA. If too much trouble, then disallow RA!=RB.
>
> RT+0 = (RA+0 + RA+4 + RA+8 ... RA+(VL-1)) + RB+0
> RT+1 = (RA+1 + RA+5 + RA+9 ... RA+(VL-2)) + RB+1
> RT+2 = (RA+2 + RA+6 + RA+10 ... RA+(VL-3)) + RB+2
> RT+3 = (RA+3 + RA+7 + RA+11 ... RA+(VL-4)) + RB+3

mmm it would be easier just to split this into 2 separate adds.  the followup
involves RB.  although, because the result is now a scalar vec4, adding RB.vec4
to RT.vec4 would need an extra instruction to change VL.  however this is
probably needed anyway.

> > would this work?
> >
> >      for i in range(VL):
> >           iregs[RT+i] = 0
> >           for j in range(SUBVL):
> >               iregs[RT+i] += iregs[RA+i*SUBVL+j]
>
> Yes, looks like it'd result in near the same op as horz-add.

ok great.  it's a general purpose way to express that SIMD-horizontal-add
discussed earlier.  horiz-vec3 of course will be a pain but horiz-vec2 and vec4
should fit cleanly.

--
You are receiving this mail because:
You are on the CC list for the bug.
```