[libre-riscv-dev] GPU design
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sun Dec 9 17:16:20 GMT 2018
ok so i believe i have a handle on a scoreboard-style
reorder-buffer-less register renaming design:
* a "register alias table" (aka RAT) has a DIRECT correspondance:
physical register file **EQUALS** the architectural register file
* the RAT entry points to a reservation station *row* (corresponding
to a functional unit "line", see below)
* each functional unit has a one-buffer-deep reservation station and
there are at least 2 of them OR
* each functional unit has a two-deep reservation station and they
have *separate* src1 and src2 "ready" lines
these two being effectively the same thing.
youtube video p4SdrUhZrBM is important to watch. at time 8:14 here:
the author points out that the two FUs, adder and mul unit, *both*
produce an "r1 dest". he says "and there is some logic which chooses
which of these shall be first".
in a Reorder Buffer, that logic is, "the instruction which is at the
head of the queue". so, that's what we need: an instruction sequence
number, where the lowest sequence number (indicating the oldest
the "winner" will get priority to write its result (CDC 6600
"Go_Write" signal) to the register file. this is directly equivalent
to a Reorder Buffer "head of queue commit".
now, what i think will also need to be transmitted into (and through)
the Register File is an encoding of the Functional Unit src1/2 /
Reservation Station row. i.e. the single-bit of a Scoreboard needs to
be packed down to a unique index, dropped through the Register File,
passed *out* the other side, and "unpacked" again.
the reason for this i believe is down to the fact that whilst the 6600
Register File is capable of passing through write values out the other
side onto "reads", what we *don't* know is: which *Reservation*
Station is supposed to pick that up (on both src1 and src2).
so, rather than pass through 50 bits of src1/src2 "ready" bits, create
a much smaller (7 bit or so) offset/index, pass that through, and
de-multiplex it on the other side.
in effect, this "packed" representation is directly equivalent to the
Tomasulo "Reservation Station Number". the "packed" representation
would normally be sent on a Common Data Bus, however the CDB in the
6600 design has been replaced with the pass-through register file.
the 4x multiplexing may still be applied on top of this, to give 4
simultaneous instruction issue, striped register files and so on. the
only thing that we would have to watch out for is: with 4 simultaneous
instructions and far more results being generated, the instruction
sequence order now needs to be preserved *at the register file
multiplexers* as well.
so, the instruction sequence number needs to be the means of
prioritising the regfile 4-way multiplexers. lowest (oldest)
sequential number always wins, and in this way i believe we may ensure
that even with up to 4 instructions completing at once, the commit
order is always preserved.
it may be more complex than that, now that i think about it.
More information about the libre-riscv-dev