[Libre-soc-bugs] [Bug 1135] add FPSCR and Rounding classes to ieee754fpu

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri Aug 11 13:05:49 BST 2023


https://bugs.libre-soc.org/show_bug.cgi?id=1135

--- Comment #12 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #6)

> will take at least *192 cycles*, no matter *how wide the SIMD units

not a problem at all for TestIssuer which ony issues one element at a time 

TestIssuer is the sole primary focus of this task.


> basically every other OoO cpu has the sticky bits handled specially for
> exactly the reason that I explained above.

at the regfile an ORing may be performed instead of a "set".
this is NOT your problem to immediately deal with.

i repeat again: we need a WORKING implementation NOT repeat NOT
i repeat again NOT a FAST implementation.


> can be done for many insns per clock, the slow part is *reading* from that
> special accumulation hardware, because the cpu has to wait for *all* prior
> fp ops to complete first, often doing a full cpu flush.

again please LISTEN. other CPUs are NOT IMPORTANT. you keep doing
this.  it is TOO COMPLEX for the size of the Grant work to do
both "first implementation" **AND** "FAST OPTIMAL implementation".

please LISTEN, follow my directions and guidance, and do not argue.

i have told you many many that you are unable to properly do time
and budget scoping, last time was *literally* 36 hours ago and you
are literally within 36 hours attempting to massively expand
the scope far beyond what the available budget can handle.

STOP IT!!!


> that's exactly what we need much later, though for now we can just use
> dependency chains and just have really slow fp.

correct.  get results, get paid, make NLnet look good, EU is happy to
give them more money, we apply and get it.


> that doesn't really work

it works perfectly: yet again however you are attempting to change
the scope from "A Working First Implementation" to "A massively
complex heavily-optimised implementation".

*please stop doing that*/

> because, unlike OE=1 which is usually switched off,
> *all* fp computation ops *always* generate sticky bits outputs, that need to
> be or-ed into FPSCR.

that can be done LATER.  **NOT NOW**. it is a complex task on its own
and if we do not have WORKING code first it is TOO MUCH.

> >   - however if set then you pass through the copy of the FPSCR bits
> >     right the way through all pipeline stages.
> 
> I'm planning on just passing the FPSCR parts through the pipeline stages,
> modifying the parts as needed.

perfect.

please REMOVE all and QNY mention of "speculative execution" from
comment #0.

that is a FOLLOWON task that will require its own special extremely
LARGE budget.


> I'm simplifying slightly since I don't want to write 10 pages of text:
> SO/CA[32]/OV[32] are passed as inputs from registers/dependency-tracking to
> all relevant ALUs, those ALUs check OE=1, which if set, then they or-in
> their overflow output and signal that SO/OV[32] need to be written.

yes. i think you missed that every output is a "Data Record" which has
a data member and an ok member....

> dependency tracking then checks if the output is set as written 

the "ok" flag, yes. you didn't miss it, awesome.

> and if so
> delays until the output is computed, then writes that output to the
> registers and/or other insn inputs as necessary. if the output is not set as
> written (computable at decode time,

it isn't.  it's computable *whether* it *could* be set.

> but i think we delay for some insns),
> then the dependency tracking uses the old SO/OV[32]/CA[32] and forwards that
> from registers/etc. to later insns.

that's way into the future.

TestIssuer simply goes, when the result pops out (NextControl ready flag)
"erm was the ok flag set, if so i'll just request a write to the regfile"
and if thereare no outstanding writes left, TestIssuer is ONLY THEN
permitted to even FETCH the next instruction, let alone decode it.

and that gets us the money and it was a lot less work.

----

(future work ONLY, thoroughly out of scope for this task):

one trick you missed above for FPSCR "sticky" bits (and i have not had
time to put this into XER.SO yet, either): if the XER.SO flag is
already set YOU DO NOT NEED TO WRITE IT.

therefore you can REMOVE that Write-Hazard entirely.

the "problem" you describe about how sticky bits would slow things
down is *only* the case if that bit is clear at the time of issue,
and some ORing at periodic intervals (defined in part by the maximum
size of the Shadow Matrix) takes care of the other cases.

but again i repeat again i repeat AGAIN:

under NO CIRCUMSTANCES attempt to implement that right now.

get everything working under TestIssuer only, and please remove
all mention of "speculation" from this task.  it is however
good that you understand the problem and the future direction.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list