[libre-riscv-dev] [Bug 296] idea: cyclic buffer between FUs and register file

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Sat May 2 16:02:55 BST 2020


https://bugs.libre-soc.org/show_bug.cgi?id=296

--- Comment #11 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
hmm hmm, the other big concern: just as with a Tomasulo Algorithm, the
Register "ID" (or... what was it... the CAM Row Number or something)
has to be:

(a) Broadcast onto each Common Data Bus along with the value
(b) stored in each and every Function Unit (which means getting it
    *to* that FU as well as having a comparator)

now, the trade-off between making that ID an unary number or a binary number
is:

* we will have around 20 Function Units
* there will be 5 ports (and therefore 5 CDBs: 3-read, 2-write)

therefore in binary:

* there will be 500 wires for a 5-bit RegisterID coming *in* to Function
  Units
* there will be 500 wires going *out* of Function Units onto CDBs
* 500 XOR gates will be needed to perform comparisons, and that's a power
  hit that we get on EVERY clock cycle (!)

in unary:

* there will be a whopping THREE THOUSAND wires coming in for a 32-bit
  unary RegisterID
* there will be three thousand going out onto the CDB (!!)
* there would be 3,000 AND gates needed, however the power hit will *only*
  be from a maximum of 5x20=100 of those going active in any one clock cycle
  because they're unary-encoded, and only 1/32 of the 3,200 bits is ever
  active, rather than *all* (5) bits active in the binary case.

to be honest, neither of these is particularly attractive! :)

compare and contrast this with the way that the 6600 works:

* The Register Port Buses, although global in nature, are direct-connected
  from Function Unit Operands to *specific* Regfile ports.
* A-Units (address Units) have 2-in, 1-out and those are wired to Regfile
  RD ports 1,2 and Regfile WR port 1
* B-Units (algorithmic units) i think likewise have 2-in, 1-out, and
  go to RD ports 3,4 and Regfile WR port 2
* X-Units have 1-in, 1-out and go to RD port 5 and WR port 3.

therefore:

* the FU-Regs Dependency Matrix captures the information about which regs
  *on which port* each FU shall read (or write) from
* this in an IMPLICIT fashion such that there is NO possibility for the
  value being broadcast to be picked up by a second Function Unit

i.e. the Register ID itself is *NOT* actually transmitted over the Bus,
at all.  it's just down to "information capture", back at the FU-Regs
Dependency Matrix.


i wonder... i wonder if there's a way to compute the amount of "shifting"
that would be required, by each FunctionUnit, and have *that* transmitted
to the FU, instead?  this would only be a 2-bit value (assuming a maximum
of 4 read-ports).

it goes like this:

* each row of the FU-Regs Dependency Matrix has captured (already, this
  is part of the job of the FU-Regs DM) which registers the FU requires.
  this *already* encodes which FU Operand Ports it needs
* when a CDB is available, we know its index ID.
* at the time that the CDB is available, we also know, back in the DM,
  which FU Operand Port index requires that value
* the DIFFERENCE between these two indices as binary values becomes EXACTLY
  the amount of shifting required, should the value be transmitted over
  that available CDB.

it's not a vast amount of gates (a 2-bit subtractor per FU per port) and
it's only 2 bits of information to be sent to each Function Unit.  note
however that each FU needs a *different* shift-diff value to be transmitted
to it, for each broadcast value on that CDB!

so if a 2-bit subtractor is... ermm... 10 gates(?) then that's:

* 10 (appx) gates for the subtractor (it doesn't need carry)
* times 20 for the number of Function Units
* times 5 for 3RD Operands and 2WR operands

1000 gates, as an adjunct to the FU-Regs Dependency Matrix.

however the number of wires is:

* 2 for the shift-diff value
* times 20 for FUs
* times 5 for the operands

a total of 200 wires and *that's* tolerable.

compare this to XOR being four gates, where in the binary-broadcast we'd have
5x20x5 wires (500) but we'd have two THOUSAND gates.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list