[Libre-soc-bugs] [Bug 230] Video opcode development and discussion
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Mon Dec 21 13:23:54 GMT 2020
https://bugs.libre-soc.org/show_bug.cgi?id=230
--- Comment #66 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to cand from comment #65)
>
> Nono, different VL, or rather mod-VL for RB. But that was only "what would
> be the most useful interpretation for when RA != RB, if it's not worth it,
> then we can outlaw them being different in reduce mode.
VL definitely cannot be different for RA RB or RT, it is a global for-loop.
the closest we can get away with is (and for mv this needed new instructions)
allowing one src to be vec2/3/4 and the other to be vec1.
> RA=RB
> Just gather-add or gather-mul the elements together. Not twice.
yehyeh. res = RA[0]+RA[1]...RA[VL-1]
> RA!=RB
> RA is gathered, RB is added/muled on top as a single vec4, not an array of
> vec4s like RA. If too much trouble, then disallow RA!=RB.
>
> RT+0 = (RA+0 + RA+4 + RA+8 ... RA+(VL-1)) + RB+0
> RT+1 = (RA+1 + RA+5 + RA+9 ... RA+(VL-2)) + RB+1
> RT+2 = (RA+2 + RA+6 + RA+10 ... RA+(VL-3)) + RB+2
> RT+3 = (RA+3 + RA+7 + RA+11 ... RA+(VL-4)) + RB+3
mmm it would be easier just to split this into 2 separate adds. the followup
involves RB. although, because the result is now a scalar vec4, adding RB.vec4
to RT.vec4 would need an extra instruction to change VL. however this is
probably needed anyway.
> > would this work?
> >
> > for i in range(VL):
> > iregs[RT+i] = 0
> > for j in range(SUBVL):
> > iregs[RT+i] += iregs[RA+i*SUBVL+j]
>
> Yes, looks like it'd result in near the same op as horz-add.
ok great. it's a general purpose way to express that SIMD-horizontal-add
discussed earlier. horiz-vec3 of course will be a pain but horiz-vec2 and vec4
should fit cleanly.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list