[Libre-soc-dev] effect of more decode pipe stages on hardware requirements for execution resources for OoO processors

Jacob Lifshay programmerjake at gmail.com
Wed Feb 16 06:01:51 GMT 2022


On Tue, Feb 15, 2022, 19:01 lkcl <luke.leighton at gmail.com> wrote:

> it should be pretty obvious that even in a single issue scenario if you
> have an instruction that requires 128 cycles to complete (e.g. a DIV) you
> clearly need more than 128 Reservation Stations in order to avoid an issue
> stall.
>

yup, that's obviously correct.

>
> it should also be obvious that if the decode phase increases by say 10
> cycles, that now more than 10+128 Reservation Stations are required to
> prevent an issue stall.
>

yup, that's obviously wrong imho, since those reservation stations aren't
required by instructions in the decode pipe, so having more instructions in
the decode pipe doesn't require more reservation stations.

I modified power-cpu-sim to have different demo code (hopefully closer to
what you wanted), switching it to:
.L2:
addi r3, r3, -1
cmpdi r3, 0
bne .L2
...

now it runs until it runs out of hardware registers in the rename stage
(equivalent of running out of reservation stations), since I only gave it
32 registers. it puts the stalled instructions into a FIFO queue between
the decode and rename stages (quirk of my program, i don't want to take the
few hours to fix it to stall fetch/decode) -- if you like, you can mentally
adjust what would happen if fetch/decode stalled instead of filling the
queue.

https://libre-soc.org/openpower/openpower/sv/effect-of-more-decode-stages-on-reg-renaming/#index6h1

notice that the 1 and 8 decode stage versions both stall the same
instructions by the exact same amount, in the exact same pattern, just
offset by 7 clock cycles.

Jacob


More information about the Libre-soc-dev mailing list