[Libre-soc-bugs] [Bug 1135] add FPSCR and Rounding classes to ieee754fpu

Fri Aug 11 15:13:29 BST 2023

https://bugs.libre-soc.org/show_bug.cgi?id=1135

--- Comment #15 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #12)
> (In reply to Jacob Lifshay from comment #6)
> TestIssuer is the sole primary focus of this task.

no, ieee754fpu is.

> at the regfile an ORing may be performed instead of a "set".

that doesn't solve the problem that the special accumulator does, which is that
future ops are delayed by waiting to see if FPSCR is updated.

> this is NOT your problem to immediately deal with.

yes, as I stated multiple times, this task only splits FPSCR into 3 parts and
has *no other work* related to speculative execution.
> 
> i repeat again: we need a WORKING implementation NOT repeat NOT
> i repeat again NOT a FAST implementation.

this task will take an additional 15min or so beyond that, for splitting FPSCR
into three parts, which I think is an acceptable amount of time to use on
planning ahead. Our CPU is in no way obligated to implement any of the
speculative speedups we discussed, those are all for much later.

> 
> > can be done for many insns per clock, the slow part is *reading* from that
> > special accumulation hardware, because the cpu has to wait for *all* prior
> > fp ops to complete first, often doing a full cpu flush.
> 
> again please LISTEN. other CPUs are NOT IMPORTANT. you keep doing
> this.  it is TOO COMPLEX for the size of the Grant work to do
> both "first implementation" **AND** "FAST OPTIMAL implementation".

please listen, I said several times that that's merely why I think FPSCR should
be split into 3 parts, other than that, all fast fp methods discussed here *are
not part of this task or any other follow on tasks*.
> 
> STOP IT!!!

I am *not expanding the scope* more than the amount of time it takes to type
this reply, which is imho a trivial amount of time. I've said explicitly from
the beginning:

> the above is all that will be implemented as part of this task, all of the below is just explaining why we should split FPSCR into parts:

this means that implementing any of the cpu optimizations beyond merely
splitting FPSCR *IS OUT OF SCOPE*.

> 
> 
> > that's exactly what we need much later, though for now we can just use
> > dependency chains and just have really slow fp.
> 
> correct.  get results, get paid, make NLnet look good, EU is happy to
> give them more money, we apply and get it.
> 
> 
> > that doesn't really work
> 
> it works perfectly: yet again however you are attempting to change
> the scope from "A Working First Implementation" to "A massively
> complex heavily-optimised implementation".
> 
> *please stop doing that*/
> 
> > because, unlike OE=1 which is usually switched off,
> > *all* fp computation ops *always* generate sticky bits outputs, that need to
> > be or-ed into FPSCR.
> 
> that can be done LATER.  **NOT NOW**. it is a complex task on its own
> and if we do not have WORKING code first it is TOO MUCH.
> 
> > >   - however if set then you pass through the copy of the FPSCR bits
> > >     right the way through all pipeline stages.
> > 
> > I'm planning on just passing the FPSCR parts through the pipeline stages,
> > modifying the parts as needed.
> 
> perfect.
> 
> please REMOVE all and QNY mention of "speculative execution" from
> comment #0.

no, they are explicitly labeled as out-of-scope in comment #0 (which you
somehow missed all this time) and documenting why splitting should occur at
all.
> 
> that is a FOLLOWON task that will require its own special extremely
> LARGE budget.

yes, that follow-on task will be much later, not part of this grant. though I
wouldn't expect it to be hugely complex to implement since we just reuse branch
misprediction machinery which we'll need anyway.

we can plan exactly what we want to do then.
> 
> 
> > I'm simplifying slightly since I don't want to write 10 pages of text:
> > SO/CA[32]/OV[32] are passed as inputs from registers/dependency-tracking to
> > all relevant ALUs, those ALUs check OE=1, which if set, then they or-in
> > their overflow output and signal that SO/OV[32] need to be written.
> 
> yes. i think you missed that every output is a "Data Record" which has
> a data member and an ok member....
> 
> > dependency tracking then checks if the output is set as written 
> 
> the "ok" flag, yes. you didn't miss it, awesome.
> 
> > and if so
> > delays until the output is computed, then writes that output to the
> > registers and/or other insn inputs as necessary. if the output is not set as
> > written (computable at decode time,
> 
> it isn't.  it's computable *whether* it *could* be set.

for CA[32]/OV[32] it should be computable at decode time even if we don't
currently.

> 
> > but i think we delay for some insns),
> > then the dependency tracking uses the old SO/OV[32]/CA[32] and forwards that
> > from registers/etc. to later insns.
> 
> that's way into the future.

yes, but that's irrelevant for how the ALUs are designed, as far as their
concerned they produce their outputs and the cpu takes care of them, how it
does that is irrelevant.

> one trick you missed above for FPSCR "sticky" bits (and i have not had
> time to put this into XER.SO yet, either): if the XER.SO flag is
> already set YOU DO NOT NEED TO WRITE IT.

yes, but unlike SO there are many sticky bits and there are usually a few that
are still zeros (because linux initializes them to zero and some fp exceptions
are extremely rare/impossible in important programs, e.g. if you never use
sqrt, the invalid sqrt flag can never be set), so relying only on that
optimization is unwise.

> under NO CIRCUMSTANCES attempt to implement that right now.

I'm not (beyond merely splitting FPSCR) and was never planning on that and
stated that many times.
> 
> get everything working under TestIssuer only, and please remove
> all mention of "speculation" from this task.  it is however
> good that you understand the problem and the future direction.

i'm not removing "speculation" from the description, it's there as
documentation of why we're doing what were doing, it is explicitly labeled
*OUT-OF-SCOPE* for this task.

-- 
You are receiving this mail because:
You are on the CC list for the bug.