[Libre-soc-dev] WIP demo of deficiency of 6600-derived architecture compared to register renaming

Luke Kenneth Casson Leighton lkcl at lkcl.net
Tue Oct 27 15:19:55 GMT 2020

On 10/27/20, Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Tue, Oct 27, 2020, 05:57 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>> ---
>> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>> On Tue, Oct 27, 2020 at 5:10 AM Jacob Lifshay <programmerjake at gmail.com>
>> wrote:
>> >
>> > I think I found a performance deficiency where the 6600-derived
>> > architecture has a bottleneck in the speed it can write to the
>> > register file when one register is repeatedly written -- it's limited
>> > to 1 write per clock, yet register renaming (since the writes are to
>> > different registers due to renaming) can support more than 1 write per
>> > clock cycle.
>> ok: it is a common misconception that 6600-style architectures do not
>> have register renaming: they have *nameless* registers because of the
>> 1:1 correspondance between the CompUnit and its incoming (and outgoing
>> in our case) register latches.
> For the diagrams, I am assuming that there are an infinite number of ALUs
> and FUs and register file R/W ports and timing is totally dependent on
> dependencies between instructions, execution latencies, and fetch/decode
> bandwidth (limited to 4 successive instructions per clock cycle). The issue
> I'm trying to illustrate (but ran out of time to actually do so) is that
> before instructions can complete (after execution and after any shadows are
> released and sometimes after results are forwarded using the unspecified
> method listed in my assumptions), they need to write their result latches
> to the register file which is limited to 1 write per register per cycle

yes, this is why in "high end" processors (AMD, etc) you typically get 12R4W

enough operand forwarding ports will substitute for that, although
Mitch analysed this scenario and found that just the one "forward" is
enough to increase the IPC by 30%.

> (otherwise you need additional write port priority logic, which quickly
> becomes messy). Register renaming supports more than that,

i don't see how it can.  register renaming of Tomasulo equals exactly
and precisely one-for-one correspondance with nameless registers, and
there are (i assume) no difference in the number of read and write
regfile ports or the number of operand forwarding ports.

unless this is a type of register renaming that has nothing to do with
OoO at all.

which makes no sense.

> the illustrated
> loop has 2 writes to r9 each loop and the steady state execution speed is 1
> loop per cycle.

which in 6600 engines go into completely different RSes, giving the
"renaming effect by using nameless latches"

let me look at it again

ldu r9, 8(r3)

* creates Write Hazard on r9 in FUREGs
* reserves RS#1 of LDST CompUnit

addi r9, r9, 100

* creates Read-After-Write hazard on 1st r9
* reserves RS#1 of ADD CompUnit
* connects LDST#1 to ADD#1 via FUFU
* creates Write Hazard on SECOND r9 in FUREGs

std r9, 0(r3)

* creates Read-after-Write (WAW) on 2nd r9
* reserves RS#2 of LDST CompUnit
* connects ADD#1 to LDST#2 via FUFU

and so on.

with operand forwarding there really is no problem here that i can
see.  there are no blockages, no stalls, it is exactly the same as
using "renaming" because that's exactly what the 6600 nameless
renaming does, with the in-flight Reservation Station Latches.


i think i might know what you're running into.

yes on the second loop there is what's called a "Write after Write"
hazard on r9.  the original 6600 *could not detect this* and would

i do have a solution for this (and still be able to do shadowing and
full precise exception cancellation).

it is... complex.  took me about 7 days to communicate it to Mitch on comp.arch.

it involves doubling-up the FUREGs cells such that each cell, if one
entry is already reserved, it can also reserve a SECOND entry (or a
3rd etc etc)

it is a bit like each cell having a mini "stack" of R/W Hazards.

it basically doubles (or triples) the amount of gates in the FU-REGs
DMs so we had better be damn sure it's actually worth doing.


More information about the Libre-soc-dev mailing list