Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sun Dec 20 11:16:11 GMT 2020
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Sun, Dec 20, 2020 at 6:00 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> > the problem is that it's not that simple. it's been established that the
> > layout you picked doesn't allow scalar default behaviour.
> That is false, I specifically designed the register layout to work just
> fine with scalars and to be backwards and forwards compatible.
... jacob: then you need to provide an algorithm that shows that! you
can't leave it to me to "derive" because i don't understand it! and
even if you did, the fact that it's different from the one that we
came up with and agreed 18 months ago means that we have to spend the
time going over it and evaluating it.
> > also what i came up with is problematic as well.
> > also: starting from CR isn't going to work from a hardware perspective;
> > moving the CRs to align at the hardware level to CR is not going to fly
> > either.
> In the CR layout scheme I proposed, starting at CR means starting at
> SVCR6_000 which is 0b110000 (48) in the register order.
which isn't clear at all. is that CR in the OpenPOWER notation? is
that CR6? are you redefining CR6 in OpenPOWER v3.0B? is there a
different CR to CR from v3.0B?
you see how utterly confusing that is? you're the *only one* who
understands this new naming scheme, and because you are the only one
who understands it we can't even spend the time to evaluate it.
i must have asked about 8 times now for clarification, and because
it's different and because you haven't been able to provide that
clarity in a swift and immediate fashion, such that evaluation cannot
even take place, i've gone, "this is taking far too long, it has to
go, we return to what's understandable, clear, and was already agreed
18 months ago"
now, if that turns out not to work *then* we come back and revisit it,
but because you've not been able to provide the information needed,
quickly, it has to go.
> All OpenPower CR
> registers are mapped to multiples of 8, so there won't be alignment issues.
> > we have to start by looking at some example assembler, and seeing if the
> > vectorisation of CRs can be accessed cleanly without too many CR mv
> > operations jamming up the works. and if those vectorised CR ops are not
> > themselves hugely problematic.
> > for example if counting sequentially from CR[offset+i] in a VL for loop
> > near 100% overwrites CR0 thru 3 this will screw scalar operations.
> It only wraps around to CR0 after VL > 16 (expands to 32, 64, and beyond
> with the future additional register-file expansion), which is big enough to
> not be an issue for the vast majority of code (gpu would run out of int/fp
> regs first).
future expansion we can evaluate in 4-5 years time. we're on the
clock *right now*, at least 8-12 months behind.
focus, focus, focus.
> > so that is no problem. it's that the encoding for STATE was carefully
> > jammed into only 32 bits and that took about a *MONTH* to design and write
> > up.
> remember, storing 64-bits (or 256-bits) instead of 32-bits on a context
> switch is basically nothing. also, SPRs are 64-bits wide, no need to cram
> it all into 32-bits. We don't need to support RV32 anymore -- PPC32 still
> can access 64-bit registers iirc.
there's enough spare bits, i'd gone over this 18 months ago, and the
decisions and design are not disrupted so it's a resolved issue (size
did not change). that said: it's not about RV32 it's about reducing
context-switching latency for high performance, however when it comes
to actually allocating them (where to put the SPRs) that will need
More information about the Libre-soc-dev