[Libre-soc-bugs] [Bug 751] idea for reducing dependency matrixes in 6600-derived architecture with register renaming

Fri Dec 3 00:57:56 GMT 2021

https://bugs.libre-soc.org/show_bug.cgi?id=751

--- Comment #12 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #11)
> (In reply to Luke Kenneth Casson Leighton from comment #9)
> > now, it so happens that only the Function Unit itself can determine,
> > itself, whether things such as XER.SO actually need to be written,
> > because writing to XER.SO is determined from the *input*, which
> > is, clearly, NOT YET EVEN AVAILABLE at the time that the instruction is
> > actually issued.
> 
> well, I'm approaching it from the perspective of: the instruction is fully
> known at decode time, if the instruction is an addi, then it never writes
> SO, and any successive instructions that read SO ignore the addi, not
> waiting for it.

that's what the PowerDecoder2 does.  it's always done that, because we got
that trick from Microwatt.

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder2.py;h=edf2893b3dec4749822db7d926efb4eaa0eea9b2;hb=HEAD#l966

 966         # rc and oe out
 967         comb += self.do_copy("rc", dec_rc.rc_out)
 968         if self.svp64_en:
 969             # OE only enabled when SVP64 not active
 970             with m.If(~self.is_svp64_mode):
 971                 comb += self.do_copy("oe", dec_oe.oe_out)
 972         else:
 973             comb += self.do_copy("oe", dec_oe.oe_out)

(this is where you can see the rule about OE being entirely ignored in
SVP64 is implemented).

"listening" to Rc and OE comes from the CSV files, which originally come
from microwatt decode1.vhdl.

therefore, addi *DOES NOT* require XER.SO writing.  here - source code
which i have already referred you to and you clearly haven't read or
asked questions about, just made arbitrary fundamental assumptions:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_regspec_map.py;h=7c066d7dbd691c089a13d981b3851a02ee1f1f89;hb=HEAD#l88

  85         if name == 'xer_so':
  86             # SO needs to be read for overflow *and* for creation
  87             # of CR0 and also for MFSPR
  88             rd = RegDecodeInfo(((e.do.oe.oe[0] & e.do.oe.ok) |
  89                                   (e.xer_in & SO == SO)|
  90                                   (e.do.rc.rc & e.do.rc.ok)), SO, 3)

note there that if the PowerDecoder2 is instructed to ignore XER.SO, it
does so, and does not even request the Read Port.  there is a corresponding
piece of code - which you clearly haven't looked at - which likewise
performs the same check on the XER.SO write port.

 179         if name == 'xer_so':
 180             wr = RegDecodeInfo(e.xer_out | (e.do.oe.oe[0] & e.do.oe.ok),
 181                                     SO, 3) # hmmm

thus: addi DOES NOT request XER.SO, read or write.  only those instructions
which MIGHT require XER.SO actually request it.

and therefore there is no "major design flaw", just an alarming lack of
working knowledge on your part as to the internals of the design, even
after two years, leading you to create "alternative designs that you think
are better", rather than talking about it months ago, and resulting in
you spending signficant time *not* helping us fulfil our goals and
obligations.

classic "Not Invented Here" syndrome, i'm sorry to have to point out.

> if it's addo, then it *always* writes SO, writing 0 if
> necessary, and any successive instructions that need SO will *always* wait
> for the addo. it never *maybe* writes anything, cuz that has questionable
> benefits and requires additional logic, it's always completely determined at
> decode time.

no, it's not.  you've failed to listen to what i wrote.  it is *not possible*
to determine entirely at instruction decode time whether XER.SO needs to
be written to.  yes it is an optimisation, but an important one.

you are correct in that actually, with XER.SO not being popularly used,
it's not that important (for XER.SO).  however the code itself has this
capability (to "drop" write-port-requests, as determined based on information
determined **AFTER** instruction decode phase), which will become critically
important later on when predication is added.  at that point it will matter
a hell of a lot, hence why i went to the trouble of putting the infrastructure
in place even at this early stage, because it will be too damn difficult
to add later [the "ok" flags on ALU regspecs, which is part of MultiCompUnit
and part of the entire pipeline data specifications]

and it's not, honestly, that difficult to detect [but is a lot of design
work right throughout the entire pipelines]

this is the one line of code needed to identify when the condition
occurs:

 741         with m.If(fu.alu_done_o & latch_wrflag & ~fu_wrok):

that's:

* "the ALU is done i.e. it is requesting that its output (which
will include XER.SO) be 'latched'"
* at that exact moment, the dest "ok" flag may (or may not) be set
* latch_wrflag captures those registers that were requested back
  at ISSUE time

the combination of these tells you which registers were REQUESTED
to be written to, but the pipeline is telling you NO, i do NOT
need to write to them.

because the pipeline is saying "no write needed", there is never
going to be a corresponding write-request to the regfile, and
consequently the Write-Hazard may be dropped.

if it isn't dropped, all hell breaks loose, the entire Engine
will lock up permanently, because the write-port was requested
at ISSUE time but is never cleared.

it's solved with a one-line test [but required a LOT of careful
advance planning to get to that point, and a hell of a lot of
work on MultiCompUnit]

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/core.py;h=bb7f8ce9e9b7b8454bc583b7fa2363f99c6e62a7;hb=56d6f9114733a20015df85da59c5d2ce694a465b#l731

-- 
You are receiving this mail because:
You are on the CC list for the bug.