[Libre-soc-dev] svp64

Jacob Lifshay programmerjake at gmail.com
Sun Dec 20 05:59:45 GMT 2020


On Fri, Dec 18, 2020, 16:28 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Friday, December 18, 2020, Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> >>
> >
> > we can just use .long in an assembler macro for now...no binutils
> > modification needed.
>
> aw fer goodness sake, why on earth...  sigh ok yes i love it, why didn't i
> think of that earlier :)
>
>
> >> * still do not know what the best arrangement for CRs is.
> >>
> >
> > I'm for the arrangement that mirrors the register layout I picked for
> > FP/Int registers.
>
> the problem is that it's not that simple.  it's been established that the
> layout you picked doesn't allow scalar default behaviour.
>

That is false, I specifically designed the register layout to work just
fine with scalars and to be backwards and forwards compatible.

>
> also what i came up with is problematic as well.
>
> also: starting from CR[6] isn't going to work from a hardware perspective;
> moving the CRs to align at the hardware level to CR[6] is not going to fly
> either.
>

In the CR layout scheme I proposed, starting at CR[6] means starting at
SVCR6_000 which is 0b110000 (48) in the register order. All OpenPower CR
registers are mapped to multiples of 8, so there won't be alignment issues.

>
> we have to start by looking at some example assembler, and seeing if the
> vectorisation of CRs can be accessed cleanly without too many CR mv
> operations jamming up the works.  and if those vectorised CR ops are not
> themselves hugely problematic.
>
> for example if counting sequentially from CR[offset+i] in a VL for loop
> near 100% overwrites CR0 thru 3 this will screw scalar operations.
>

It only wraps around to CR0 after VL > 16 (expands to 32, 64, and beyond
with the future additional register-file expansion), which is big enough to
not be an issue for the vast majority of code (gpu would run out of int/fp
regs first).

>
> the numbering cannot be arbitrarily picked and declared "final" without
> justification is what i am saying.
>

yup, I didn't declare it final, I just declared it good enough for an
educated guess.

>
> so if you prefer a particular scheme, it needs to be accompanied by example
> assembler code showing how it is efficient and effective.
>
> then also i suspect that the same reasoning that went into the creation of
> the original encoding may also apply to CRs.  but, again, this needs to be
> verified by creating examples.
>
> this may take time sigh.
>
>
>
>
>
>
>
> >> the choices here are: abandon data-dependent ffirst or spend a huge
> amount
> >> of time reviewing the SPR layout.  neither are good choices when we are
> >> under time pressure.
> >>
> >
> > I think we should just go with the option that SPRs are relatively cheap,
> > so we can just allocate 1 spr completely to VL (supporting 0 through 64
> > inclusive for now), since on future versions we'd want to support VL>64
> > anyway. This would also improve semantics for when we want to read VL
> after
> > a load ffirst or other non-setvl VL-adjustment instruction, avoiding the
> > need to mask/shift to extract the VL value.
>
> fortunately it is not unreasonable to have multiple ways to get at the same
> underlying register.
>
> in the original version if SV i thrrefore provifed *two* wats to set and
> get VL a, MAXVL and SUBVL
>
> * STATE CSR which contained them all.
> * separate MAXVL CSR
> * separate VL CSR
>
> *all* of these were read/write, and the immediate CSR write instructions
> could write up to 5 bits into a CSR using only a 32 bit instruction (values
> 1 to 32 for VL and MAXVL).
>
> so that is no problem.  it's that the encoding for STATE was carefully
> jammed into only 32 bits and that took about a *MONTH* to design and write
> up.
>

remember, storing 64-bits (or 256-bits) instead of 32-bits on a context
switch is basically nothing. also, SPRs are 64-bits wide, no need to cram
it all into 32-bits. We don't need to support RV32 anymore -- PPC32 still
can access 64-bit registers iirc.

>
> we don't have a month to spend, abandoning that work and redoing it.
>

SPR layout is much less performance critical, we can just use more SPRs if
we need, no need to squeeze every last bit of space out.

Jacob


More information about the Libre-soc-dev mailing list