[Libre-soc-dev] effect of more decode pipe stages on hardware requirements for execution resources for OoO processors

Wed Feb 16 02:48:58 GMT 2022

On February 16, 2022 2:34:15 AM UTC, lkcl <luke.leighton at gmail.com> wrote:
>
>
>On February 16, 2022 2:10:28 AM UTC, Jacob Lifshay
><programmerjake at gmail.com> wrote:
>
>>the limit comes from the ldu writing the address register then the
>next
>>loop's ldu reading it, 
>
>assume there is no such link or that operand forwarding exists to solve
>it.

actually there is no problem even with ldu, a hazard dependency chain is created that spans LDs, STs, registers including CTR up to any depth of looping until the available Reservation Stations are maxed out.

the object of the exercise here is to demonstrate to you that when the pipeline depth is longer it requires more Reservation Stations to keep further ahead.  this should be pretty obvious but for some reason (and also because you keep putting barriers in place) it is not obvious to you.

now it may be the case that the extra overhead is simply N plus extra_pipeline_depth but i have a sneaking suspicion it is N plus (num_multi_issue times extra_pipeline_depth).

if you are unable to think this through with LD/STs involved then remove them and use arithmetic operations instead, where, yes, there are deliberate RaW and WaR hazards linking the loop iterations together.

the exercise is only demonstrated under the specific circumstances where there are long hazard dependency chains.

l.