[Libre-soc-bugs] [Bug 230] Video opcode development and discussion
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Mon Dec 21 07:49:05 GMT 2020
https://bugs.libre-soc.org/show_bug.cgi?id=230
--- Comment #65 from cand at gmx.com ---
(In reply to Luke Kenneth Casson Leighton from comment #64)
> but... you're looking for a *different* elwidth on RA from RB, is that right?
> if so, that's.... yeah, very tough to fit into the SV paradigm, because
> elwidths apply in "arithmetic" cases to the whole operation, and in "MV"
> cases you have a src elwidth and a dest elwidth.
Nono, different VL, or rather mod-VL for RB. But that was only "what would be
the most useful interpretation for when RA != RB, if it's not worth it, then we
can outlaw them being different in reduce mode.
RA=RB
Just gather-add or gather-mul the elements together. Not twice.
RA!=RB
RA is gathered, RB is added/muled on top as a single vec4, not an array of
vec4s like RA. If too much trouble, then disallow RA!=RB.
RT+0 = (RA+0 + RA+4 + RA+8 ... RA+(VL-1)) + RB+0
RT+1 = (RA+1 + RA+5 + RA+9 ... RA+(VL-2)) + RB+1
RT+2 = (RA+2 + RA+6 + RA+10 ... RA+(VL-3)) + RB+2
RT+3 = (RA+3 + RA+7 + RA+11 ... RA+(VL-4)) + RB+3
> would this work?
>
> for i in range(VL):
> iregs[RT+i] = 0
> for j in range(SUBVL):
> iregs[RT+i] += iregs[RA+i*SUBVL+j]
Yes, looks like it'd result in near the same op as horz-add.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list