[Libre-soc-dev] effect of more decode pipe stages on hardware requirements for execution resources for OoO processors

Wed Feb 16 02:10:28 GMT 2022

On Tue, Feb 15, 2022, 18:02 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Wed, Feb 16, 2022 at 1:09 AM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> > making it 8-wide exposes the loop-carried dependencies on ctr and the
> > address in r3, making the loop max out at 4 instructions per cycle
> despite
> > the larger fetch bandwidth.
>
> assume that the LDs and STs are independent such that there is no such
> limit [like in the score6600_multi.py LDST address hazard detector]
>

the limit comes from the ldu writing the address register then the next
loop's ldu reading it, and from the branch-decrement-ctr also
reading/writing the ctr register. those are all register Read-After-Write
dependencies. it has nothing to do with actually loading/storing to memory,
so improving memory hazard detection won't help. in fact, in those tables
on the wiki, i assume that all memory accesses always take exactly 1 cycle.

Jacob