[libre-riscv-dev] GPU design
lkcl at libre-riscv.org
Wed Dec 5 08:08:39 GMT 2018
this is a combined proposal with:
* lane-multiplexing on the register file, odd/even and msw/lsw
* a 32-entry register cache
* a regcache#-to-ROB# lookup table
* a standard Reorder Buffer
the bytemask concept has been left out, for clarity.
the regcache-to-ROB lookup table i *believe* fixes the issue of
needing a CAM. the regcache has a secondary purpose, aside from
reducing the number of ports on the register file: it also reduces the
size of a register lookup on each instruction cycle from 7-bits down
when adding a new instruction to the ROB, instead of "in every entry
is the src register used anywhere in the ROB?", the question becomes,
"is the memory entry for the ROB-Dest table empty?".
what it *doesn't* do is remove the need for a CAM altogether, as in
effect the CAM has moved from the ROB to the register cache.
the size of the ROB being equal to the reg cache is deliberate but not
the *type* of the register can also be included in the reg cache.
meaning, only 1 32-entry cache to cover *both* integer *and* FP
my only big concern is: can the needed capacity of the cache be
exceeded by the number of operands needed? i *believe* the answer is,
"if the ROB were larger than the Reg Cache" the answer could be "yes".
given that each operation may have 2x src registers (or 3 in the case
of FMADD, and 4 if predicated), do we need the Reg Cache to be *twice*
the size of the ROB? or 3x?
i have no idea, here, i've not thought it through.
More information about the libre-riscv-dev