[Libre-soc-dev] twin predication and svp64

Fri Dec 11 20:45:58 GMT 2020

On Fri, Dec 11, 2020, 12:19 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On 12/11/20, Jacob Lifshay <programmerjake at gmail.com> wrote:
> should we start stripping out potential combinations of
> twin-predicated mv just because there are single-predicated pseudo-ops
> that do exactly the same job?
>

no, I'm just saying don't attribute to twin predication the things that the
ops could do anyway.

>
> >> * the src pred is set to "all 1s"
> >> * the src is set to "scalar"
> >> * the dest pred is set to "1<<r3"
> >> * the dest is set to "vector"
> >>
> >
> > this happens for any scalar -> vector with a mask of 1<<r3, twin
> > predication is not necessary for it to work.
>
> i hear what you are saying: it is good to have identified that there
> are redundant ways to achieve the same thing (like there are multiple
> ways to do mr, such as addi r3, r4, 0 and ori as well). however i do
> not believe that any action should be taken.
>

agreed.

>
>
> >>      ireg[rs] = ireg[rd+ireg[r3]]
> >>
> >
> > no, you get:
> > ireg[rd+ireg[r3]] = ireg[rs]
>
> sigh.  logic dyslexia kicking in.  well spotted.
>
> >>
> > Ok, I think the issue is that when I was saying mv.x, I meant the
> > vectorized version:
> > for i in 0..VL {
> >     let idx = reg[ra + i];
> >     if idx >= VL {
> >         trap();
> >     }
> >     reg[rd + i] = reg[rb + idx];
> > }
>
> ahh riiight.   yes.  this one.  i don't even know what to call it. i
> do remember we last discussed it... a year ago?
>
> yes absolutely, twin predication cannot, "on its own" cover this case
> [ it can however augment it, in very interesting - read completely
> loopy mind-corkscrewing - ways].
>
> twin predication is like an ordered sequenced back-to-back VGATHER
> VSCATTER.
>

if those ops are called vgather/vscatter then that is a misnomer since
vgather is traditionally a vector load that acts kinda like vector mv.x and
vscatter is the inverse store operation. What you're probably thinking of
is more correctly termed compress/expand:
compress takes a vector with elements with gaps and compacts the elements
to the beginning of the vector in a consecutive range in the same order
with no duplication/deletion.
expand takes a consecutive range and inserts gaps -- the inverse of
compress.

> also, by setting ra[0..VL] to [5, 5, 3, 3, 4, 4, 4]
> > you can get in 1 vector mv.x instruction:
> > dest = [src[5], src[5], src[3], src[3], src[4], src[4], src[4]];
> > which isn't possible if the mv.x adds idx to rd instead of rb.
>
> ok.  so turning 1<<r3 into a mv.x cannot be achieved in some cases.  i
> can live with that for a first implementation.
>
> optimise later.
>

yup.

>
>
> >>     ireg[rs+ireg[r3]] = ireg[rd]
> >>
> >
> > no, you get:
> > ireg[rd] = ireg[rs+ireg[r3]]
>
> sigh.  thank you for spotting this.
>
> > which is scalar mv.x except it doesn't trap if r3 is out of range and
> just
> > doesn't write rd
>
> excessive ranges should have already been checked.  yes that involves
> a pre-analysis of the predicate bits, or it is simply the case that
> the exception is thrown back at the issue phase:
>
>     if reg# + VL >= Len(regfile) trap()
>
> this ensures that even when predicate is 1<<r3 there is no possibility
> for trying to access beyond end of regfile.
>
> question: if r3 is greater than VL, should a trap be thrown?
>

in the case of the mask being 1<<r3 -- no. just like a mask with stray high
bits set won't change anything. Though we should mask off the upper bits of
r3 in order to match the semantics of shift-left: 1 << 0x102 == 4 since
0x102 wraps around to 2.

in the case of mv.x, yes, we trap. Though we could define idx==-1 to mean
that that element gets set to 0 or something, though I think we should
allow disabling that using an immediate bit.

>
> >>
> >> which is a different *type* of mv.x operation, but it is still a mv.x
> >> operation.
> >>
> >> it gets exceptionally weird if we apply twin-predication *to* mv.x.  i'm
> >> not going to go there quite just yet :)
> >>
> >>
> >>
> >> > only for that specific mask, I was taking about the fully general
> vector
> >> > case.
> >> >
> >>
> >> you've lost me,
> >
> >
> > what I meant was the pseudo-code I wrote earlier which is the vector
> mv.x.
>
> yehyeh got the contex now.
>
> > You can't replace the fully general vector mv.x with a single
> > twin-predicated vector mv no matter how hard you try,
>
> i don't (never have).
>
> tpred however fits into the Dep Matrices nicely with less blocking
> resources, where vectorised mv.x is veeery heavy.
>
>
>
> > Replacing a scalar mv.x with twin predicated vector mv is possible, but
> > seems less efficient unless we have the special hw support for reading r3
> > then the selected input I mentioned earlier.
>
> ... jacob i really really do not wish to get into discussions of
> alternative designs of the level of complexity and associated time
> that is involved.
>
> we are at least 8 months behind schedule and simply do not have the
> funding available to cover it.
>

yeah, ok. save optimization for v2.0.

>
> 1<<r3 is a dead simple idea that the Predication Unit can do with a
> Binary to Unary Encoder, taking r3, turning it to an unary mask, and
> chucking it at the Shadow Cancel/Success wires that lead to the FUs
> under Shadow Conditions.
>
> in the case where r3 is a straight INT predicate, the *binary* bits
> (as-is) are chucked straight at the same Shadow wires.  where
> inversion is applied, r3 is simply bitinverted before being chucked at
> Shadow Wires.
>
> this is brain dead simple.   suboptimal in the case of 1<<r3, but
> trivial to add.
>
> *later* we can add macro-op detection and optimisation.
>
> but not right now.  we simply don't have time.
>
>
> > and because you're not familiar with SV and
> >> twin-predication, can you come back to this once it's clear?
> >>
> >
> > AFAIK I am familiar with SV and twin predication...
>
> not at the hardware level.  Predication Units need to be separated
> from the Arithmetic Units that they cover, with Shadows.
>
> this is the only sane way to do it.
>

for masks that are fully general, yes. for 1-hot masks, not necessarily.

>
> nuts.  i managed to accidentally delete something on this phone and
> can't undo edit.  will reread.
>

Use Hackers Keyboard and type Ctrl+Z which usually works on Android.

Jacob