[Libre-soc-dev] effect of more decode pipe stages on hardware requirements for execution resources for OoO processors

Jacob Lifshay programmerjake at gmail.com
Wed Feb 16 06:01:51 GMT 2022

On Tue, Feb 15, 2022, 19:01 lkcl <luke.leighton at gmail.com> wrote:

> it should be pretty obvious that even in a single issue scenario if you
> have an instruction that requires 128 cycles to complete (e.g. a DIV) you
> clearly need more than 128 Reservation Stations in order to avoid an issue
> stall.

yup, that's obviously correct.

> it should also be obvious that if the decode phase increases by say 10
> cycles, that now more than 10+128 Reservation Stations are required to
> prevent an issue stall.

yup, that's obviously wrong imho, since those reservation stations aren't
required by instructions in the decode pipe, so having more instructions in
the decode pipe doesn't require more reservation stations.

I modified power-cpu-sim to have different demo code (hopefully closer to
what you wanted), switching it to:
addi r3, r3, -1
cmpdi r3, 0
bne .L2

now it runs until it runs out of hardware registers in the rename stage
(equivalent of running out of reservation stations), since I only gave it
32 registers. it puts the stalled instructions into a FIFO queue between
the decode and rename stages (quirk of my program, i don't want to take the
few hours to fix it to stall fetch/decode) -- if you like, you can mentally
adjust what would happen if fetch/decode stalled instead of filling the


notice that the 1 and 8 decode stage versions both stall the same
instructions by the exact same amount, in the exact same pattern, just
offset by 7 clock cycles.


