[Libre-soc-bugs] [Bug 1135] add FPSCR and Rounding classes to ieee754fpu
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Thu Aug 10 20:52:16 BST 2023
https://bugs.libre-soc.org/show_bug.cgi?id=1135
Luke Kenneth Casson Leighton <lkcl at lkcl.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lkcl at lkcl.net
--- Comment #1 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #0)
> to allow FP ops to compute in parallel despite each fp op semantically
> reading the FPSCR output from the previous op, the FPSCR will be split into
> 3 parts (I picked names that aren't necessarily standard names):
> * volatile part: written nearly every insn but is rarely read
> FR, FI, FPRF
> * sticky part: generally doesn't change but is read and written by nearly
> all insns:
> all the sticky exception bits
> * control part: generally doesn't change and is only read by nearly all
> insns:
> all the other bits
>
> the above is all that will be implemented as part of this task, all of the
> below is just explaining why we should split FPSCR into parts:
>
> the idea is that the cpu will have all three parts in separate registers and
> will speculatively execute fp insns with the current value of the sticky
> part register (not the one from the previous instruction, but the one from
> the register, avoiding needing a dependency chain),
please do not do that. if there is a dependency chain it is just tough luck.
the programmer is already warned in the spec "some things might be slower"
and surprise, that's what they get.
> and then will cancel and
> retry all later insns if it turns out that the insn changed the sticky part
> (which is rare).
no, you REALLY do not want to be doing that.
follow EXACTLY how XER works, please, starting with adding FPSCR as
"its own register file".
do NOT attempt repeat DO NOT attempt to add "speculation" of ANY KIND.
do NOT attempt repeat DO NOT attempt to make drastic modifications to
the existing design.
do NOT repeat DO NOT assume that "the first implementation has to be
fastest bestest most amazingest most brilliant most highest performance".
we need WORKING, first.
please follow this procedure:
* split the FPSCR-regfile into the four (or more) parts that you advocated
* pass in the parts of FPSCR that *might* be written to, as "read operands"
(these will be written-out *if* needed)
* pass in an immediate operand (in the Record)
"fp_overflow_just_like_xer_so_overflow"
- this if clear is how you know that the copy of FPSCR will not be
read, and consequently not be written to
- however if set then you pass through the copy of the FPSCR bits
right the way through all pipeline stages.
* and EXACTLY as is done with XER.SO when overflow is enabled,
have the final stage of the pipeline set or clear the "data.ok"
bit.
this "data.ok" bit will indicate to the register file, which will
have been waiting for that result, that "actually write is not needed".
i repeat DO NOT deviate from the existing micro-architectural design
IN ANY WAY.
BEFORE BEGINNING please can you describe in your own words precisely and
exactly how XER.SO XER.CA/32 and XER.OV/32 work, and how they are part
of a special "regfile".
you will need to analyse the ALU CompUnits and pipelines, as well as the
reg data structures and observe how the Records have an "overflow" entry
that is passed right all the way down through the "stages", and how
each pipeline stage manually copies (and occasionally modifies) the
inputs thru to the outputs, *and* how SPECIAL ATTENTION is paid to
copying the "ok" bit from input thru output
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/common_output_stage.py;hb=HEAD
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list