[Libre-soc-dev] Reservation Stations. Was [Libre-soc-bugs] [Bug 782] add galois field bitmanip instructions

Wed Mar 9 07:42:01 GMT 2022

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Wed, Mar 9, 2022 at 5:50 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> Ok, it turns out that I likely got my terms mixed up, in the bugzilla
> messages I kept using FU but instead meant a RS.
>
> On Tue, Mar 8, 2022, 04:47 lkcl <luke.leighton at gmail.com> wrote:
>
> >
> > FSMs that take say 30 cycles (and therefore require a minimum
> > of 30 Reservation Stations)
>
>
> why in the world would you need 30 RSes?

this is the absolute absolute inviolate rule: i repeat it again until you have
accepted it.

you CANNOT have un-managed data.

i will repeat it again.

you CANNOT have un-managed dependencies.

any un-managed dependencies absolutely have to be met
with a stall at issue time.

therefore the moment you run out of RSes, you MUST stall.

therefore, any conditions where you are expecting there to
be tight loops that do not stall, you MUST have sufficient
RSes.

> ! if the FSM starts 1 instruction,
> executes for 30 cycles,

[which is no different from having a pipeline of depth 30...]

> and then can start the next instruction (i'm
> assuming it can't run multiple instructions simultaneously in the FSM),
> there only needs to be enough RSes to ensure that it can always start
> executing the next instruction immediately when it finishes the previous
> instruction.

uh-huhn? yes?  think it through.  how many operations of type
handled-by-the-FSM do you want to be executing simultaneously?

if they take 30 cycles per FSM, and you want 30 such operations
to be in-flight, it is ABSOLUTELY required that there correspondingly
be [minimum] 30 RSes.

note i said "if you want 30 such operations to be in-flight"
which for e.g. FPDIV tight-inner-loops you already explained
to me 18 months ago is absolutely critical.

> for a FSM that slow, you could probably get away with only 2
> RSes,

then on the 3rd such operation issued to those FSMs, you MUST
stall the entire processor issue.  if three such instructions were issued
in quick succession, that's an entire *28* cycles of stall.

to prevent that from happening, you *MUST* be able to allocate
to Dependency Matrices, you *MUST* allocate to RSes, and therefore
you MUST allocate 30 RSes.

again, i repeat: it is absolutely no different from having a pipeline
of depth 30.

if the pipeline is of depth 30, you *MUST* have 30 RSes to cover
the entire suite of outstanding 30 in-flight registers, otherwise
you have to stall issue.

you need to get over the mis-apprehension that "FSMs are better".
as far as RSes and DMs are concerned, they HAVE to be treated
exactly the same as pipelines.

> need enough RSes to keep it fed, even if it takes 500 cycles to execute
> that instruction. at least one RS must be ready to execute immediately when
> the FSM

or 500-stage-pipeline

> can start executing again, at any other time you can have 0 ready
> RSes. You do not need 500 RSes there.

correct... as long as there are no "loops" involving that instruction
that are less than 499 instructions long.

loop:
         500op RT, RA, RB ; does not matter if it is a FSM or a pipeline,
                                       ; it takes 500 cycles to complete
         bc loop

that assembly code, if you want it not to stall, had better have 500+
RSes.

i recall that you mentioned that in 3D, FPDIV is often critical for inner
loops.  if that takes 32 cycles to complete, it had better be matched
by 32 RSes (minimum)... REGARDLESS of whether it is an FSM
or a Pipeline.

if FPDIV takes 500 cycles, and is inside an inner loop, and you want
that loop not to stall, you had better have 500 RSes REGARDLESS
of whether FPDIV is an FSM or a pipeline.

> I trimmed the rest of the email since I read it but have no particular
> response...mentioning this to avoid you feeling like i ignored you.

appreciated.

> I agree that pipelines vs. FSMs requires analysis before you decide what to
> use. I wasn't disagreeing with that...I was pointing out a case where FSMs
> aren't always better.

this masquerades that they are no different.

as far as DMs and RSes are concerned, whatever applies to FSMs
absolutely absolutely has to apply to pipelines as well.

therefore, if instead of considering the example of a 500-cycle
FSM, the exact same argument *MUST* apply to a 500-cycle
Pipeline as well.

this is not negotiable, it is just a fact.

l.