[Libre-soc-dev] WIP demo of deficiency of 6600-derived architecture compared to register renaming

Luke Kenneth Casson Leighton lkcl at lkcl.net
Tue Oct 27 21:23:36 GMT 2020


On Tue, Oct 27, 2020 at 7:52 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Tue, Oct 27, 2020 at 10:10 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> > I am, however, going to finish the demo diagram since I think the issue is a little more complex than just a WaW hazard.

good, because it needs investigating properly.

> Completed and pushed! It won't show up until the ikiwiki errors are
> fixed,

sorted

> however it is in the git repo:
>
> https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=3d_gpu/architecture/compared_to_register_renaming.mdwn;h=8b4e2e4c78dd889985736ea60968f768bcb3a122;hb=d4489f6f651f5d88541ca056636e5870d5a01d3f
>
> https://libre-soc.org/3d_gpu/architecture/compared_to_register_renaming/

comments.

1) "Notice how the WaR Waits on `r9` cause 2 instructions to finish
per cycle (5 micro-ops per 2 cycles)

right.  this isn't necessarily the case.  once an FU has read from the
regfile into its in-flight it drops the dependency entirely.  thus if
the new instruction being issued is after that point there will only
be the one WaR wait, not two.

2) in column 3 i'm not seeing an INT reg write.  so the delay "Av r3"
is unnecessary.

the design that we are doing, the different regfiles are completely
independent.  CTR is *not* in the same regfile as INT regs, neither is
XER and CR is entirely indepenent as well.  the full list is here:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/regfile/regfiles.py;hb=HEAD

note that the smaller regfiles (State, Fast, CR. XER) *do* have
multiple write ports.  these will be done entirely as DFFs and they
are tiny.

it is only the absolutely massive ones (32x64 INT, 100+ SPRs) that
have only the one write port.  INT takes up 1/4 of the 180nm ASIC size
(as big as the 64-bit multiplier: 15,000 gates), SPR takes up...
mm.... 10%.  CRs is about 1% despite it having *3* write ports and *5*
read ports.

3) in the (ridiculously complex) WaW detection system it becomes
possible to eliminate WaW entirely, by detecting the condition that a
WaW register has been overwritten.  it then becomes a "pure" in-flight
nameless register and sits exclusively in the output latch of the FU.

once the last reader of that FU latch has got the result (which goes
by the Op-Fwd bus only), and there are no more read dependencies on
that result then because of the earlier detection that it was an
overwritten WaW it may be *DROPPED* on the floor.

this saves a write to the regfile port and if things are particularly
busy there will be no free slot in the regfile write anyway.  however
interestingly if a write slot _does_ become available then it can be
written to the regfile, the FU is freed up, and from that point
onwards the value is treated as an ordinary reg-read.

welcome to one of the most mind-bendingly complex areas of computer
architecture :)

l.



More information about the Libre-soc-dev mailing list