[Libre-soc-dev] effect of more decode pipe stages on hardware requirements for execution resources for OoO processors

Wed Feb 16 19:01:47 GMT 2022

On Wed, Feb 16, 2022 at 6:20 PM Jacob Lifshay <programmerjake at gmail.com> wrote:

> yup, that's exactly my point, that adding more fetch/decode stages before allocating RSes doesn't require more RSes or other execution resources if you ignore branch prediction.

i note - and am quite concerned - that you did not acknowledge what i
said.  i leave it to you to acknowledge the difference.

now, that aside, the big downside of dropping instructions into a FIFO
(rather than into RSes) is that you absolutely cannot put anything
from that FIFO out of order: you absolutely *must* drop instructions
in-order from the FIFO into the Dependency Matrices.  and, once the
FIFO is full, you absolutely have to stall, there's nothing you can
do.

[a 10-entry post-Decode FIFO plus a 40-entry RS Suite will *not* have
the same capacity for OoO execution as a 50-entry RS Suite]

a 10-entry FIFO plus 40-entry RSes *will* have a reduced IPC when long
dependency-chains occur, i.e. when the 40 RSes are full, whereas
50-entry RSes will still accept up to 40-long dependency-chains and
still be able to execute up to 10 additional instructions, even when
those instructions take longer to decode (or execute).

l.