[libre-riscv-dev] Instruction sorta-prefixes for easier high-register access

Sat Jan 26 02:37:13 GMT 2019

On Fri, Jan 25, 2019, 18:17 Luke Kenneth Casson Leighton <lkcl at lkcl.net
wrote:

> On Saturday, January 26, 2019, Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> >
> > > Really, I prefer 2 for 32bit int ops and all C ops, that way it's
> always
> > > possible to specify 8/16/32/default.
> > >
> > 32-bit int ops can have a single bit that switches from 32/default
> > (OP-32/OP) to 8/16.
>
>
> I know there is something odd with the OP32 stuff in RISCV, it was
> discussed a year ago. The opportunity to have RV32 executables run
> unmodified on an RV64 host (in RV64 Mode) was lost because the opcodes
> actually change meaning depending on whether RV32 or RV64 Mode CSR is set.
>
> I would feel comfortable with only a single bit to set 8/16 to OP32 / OP
> only after doing a full walkthrough.
>
seems reasonable.

>
>
> > We can have a different prefix encoding for C ops to have the required 2
> > bits, or, since they should expand 1:1 to 32-bit ops, we can just have
> > combinations for the most common prefixes, requiring full instructions
> for
> > uncommon cases.
> >
> >
> Makes sense
>
>
> > > Crucial strategic op missing is MVX:
> > > regs[rd]= regs[regs[rs1]]
> > >
> > we could modify the definition slightly:
> > for i in 0..VL {
> >     let offset = regs[rs1 + i];
> >     // we could also limit on out-of-range
> >     assert!(offset < VL); // trap on fail
> >     regs[rd + i] = regs[rs2 + offset];
> > }
> >
> > The dependency matrix would have the instruction depend on everything
> from
> > rs2 to rs2 + VL and we let the execution unit figure it out.
>
>
> O yuk! :) ok so if the following instructions use registers that are
> outside the bounds of rs2..rs2+VL the instruction issue phase may
> proceed...
>
> And as it does not need reading the regfile to do that calculation...
>
> Smart!
>
>
> >  for
> > simplicity, we could extend the dependencies to a power of 2 or
> something.
> >
> >
> Yes.
>
>
> > >
> > > However this is a pig to implement in hw, when it becomes parallel,
> even
> > > more so. I did however come up with a schroedinger scheme for
> > predication,
> > > the predicated ops are allocated to ALUs, which depend on a special
> > > predication FU and hold a write hazard.
> > >
> > > When the predicate is free to be read by the special PrFU, it sends
> > either
> > > "die" or releases the write hazard line.
> > >
> > > I think same thing can be done for MVX. Split into 2 phases (2 FUs),
> one
> > > which reads the regfile, &s with 0x7f (whatever), then passes that
> > through
> > > to 2nd phase to look up in regfile.
> > >
> > > Only thing is, damn, it messes up the dependencies. You can't proceed
> > > further with instruction issue (not to an OoO engine) until all of
> those
> > > 2nd phase regfile lookups are known.
> > >
> > mvx is a last resort instruction. We definitely need it because we can
> > implement it in HW to be up to several times faster than the fallback
> > (bunch of st/ld or bunch of scalar mv) and much less instruction issue
> > bandwidth and energy than the fallback.
> >
>
> agreed.  don't like it: the constrained/relative on is... tolerable (the
> hardware design is going to be a dog's dinner mess.... *sigh*)
>
> We should add some constrained swizzle instructions for the more
> > pipeline-friendly cases. One that will be important is:
> > for i in (0..VL) {
> >     let i = i * 4;
> >     let s1: [0; 4];
> >     for j in 0..4 {
> >         s1[j] = regs[rs1 + i + j];
> >     }
> >     for j in 0..4 {
> >         regs[rd + i + j] = s1[(imm >> j * 2) & 0x3];
> >     }
> > }
> >
>
> i take it 0..4 means actually 0,1,2,3?  and 0..VL means 0,1,2.... VL-1?
>
Yes. It's Rust's range syntax:
https://doc.rust-lang.org/reference/expressions/range-expr.html

>
>
> > Another is matrix transpose for (2-4)x(2-4) matrices which we can
> implement
> > as similar to a strided ld/st except for registers.
> >
> >
> recorded in the microarchitecture notes so we don't lose track.
>
>
> > Note that all of the above operations should be operating on elements,
> not
> > registers.
> >
> >
> understood / agree.
>
> l.
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>