[Libre-soc-dev] Reservation Stations. Was [Libre-soc-bugs] [Bug 782] add galois field bitmanip instructions

lkcl luke.leighton at gmail.com
Tue Mar 8 16:16:25 GMT 2022


here is an explanatory diagram:
https://libre-soc.org/3d_gpu/pipeline_vs_fsms.jpg

on the left, a 10-stage pipeline and (barely enough)
10x ReservationStations.

at the top are 4x Read-Priority-Pickers, 4x Common Data
Buses, plus remember the operand, brings in a whopping
but perfectly reasonable and justifiable THREE HUNDRED
data wires into each and every one of the 10 Reservation
Stations.

why 300?

because 4x 64-bit registers, RA, RB, RC and an SPR
(for the Galois Field Reduced Polynomial) equals a
total of 256 (4x64) and it is not unreasonable to have
another 45 or so wires from the operand decode phase.

that's 300 wires out of each RS into a 10 (TEN) way
Mux, totaling...

three

THOUSAND

wires

into the single-pipeline mux.

"fortunately" at the other end, it would be of the order of 64x10
"only" 640 wires out, to farm-out the results from the pipeline
back to the corresponding (10) RSes, pending the Write-Priority-Picker
to give each of the (10) RSes one single opportunity to write to the
(one) write-port available on the INT Register File.


on the right, a 10-way Finite State Machine, which, obviously,
does not have that completely-mad three THOUSAND wire
Mux.

however to illustrate the point about shared pipelines even in
FSMs, only the first 7 cycles of the FSM produce partial-results,
and the last 3 cycles go into 3-stage "partial-results-processing"
pipelines.

the usual rules apply: if there are 3 stages then there must
be a maximum of 3 FSMs that share that 3-stage pipeline.
if there are greater than that, then the FSMs are guaranteed
to stall.

if there are less than that (2 FSMs share a 3-stage pipeline)
then the pipeline will only ever be 2/3 occupied (at least one
stage is 100% guaranteed to run empty).

it is a delicate balance, a choice between the two.

they are in groups: 2 FSMs in the diagram share a 3-stage
pipeline, meaning there must be QTY 5of such 3-stage
pipelines.

the fan-in and fan-out is absolutely identical to a ReservationStation2
to the extent that the exact same class - unmodified - may be utilised,
here.

the balancing on resources is much better (read: far less completely
insane), in that only having QTY 5of 2-in 2-out MUXes @ 64-bit
(or more like 192-bit) is infinitely preferably to an absolutely-mental
10-way MUX on 300+ wires.

i believe this illustrates the point that pipelines are not under 100%
of all circumstances "better" than FSMs?  Jacob, your thoughts?

l.



More information about the Libre-soc-dev mailing list