[Libre-soc-dev] gigabit router design decisions
Jacob Lifshay
programmerjake at gmail.com
Thu Nov 4 21:30:27 GMT 2021
On Thu, Nov 4, 2021 at 3:31 AM Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:
>
> On Thu, Nov 4, 2021 at 1:39 AM Jacob Lifshay <programmerjake at gmail.com>
wrote:
>
> > branch prediction doesn't require speculative execution:
>
> of course it does! you have to fetch the instructions ahead, and
> you have to execute the instructions ahead... then cancel both.
>
> that implies that cancellation infrastructure has to be added right
> the way through the entire design.
It doesn't matter much anymore, but I'll explain again anyway:
Branch prediction doesn't require
speculative execution because you can build a processor that will fetch
ahead but *not* execute ahead:
e.g.:
for the following loop:
https://rust.godbolt.org/z/nzT4P1vW1
(edited slightly, assigning addresses)
f:
0x1000: addi 3, 3, -1
loop:
0x1004: lbzu 4, 1(3)
0x1008: cmplwi 4, 0
0x100C: beq 0, end
0x1010: cmplwi 4, 97
0x1014: bne 0, loop
end:
0x1018: li 4, 0
0x101C: stb 4, 0(3)
0x1020: blr
fetching *with branch prediction*, but *no speculative execution*,
branch back to loop is predicted taken (loops once):
(pay close attention to how the lbzu *isn't* started executing
until the bne finished, but is fetched ahead of time -- sorry,
couldn't come up with a better example since branches take just 1 cycle)
+---------+---------+---------------------------+---------------------------+
| fetch | issue | execute #1 | execute #2 / comment
|
+---------+---------+---------------------------+---------------------------+
| 0x1000: | | |
|
| 0x1004: | 0x1000: | |
|
| 0x1008: | 0x1004: | 0x1000: addi 3, 3, -1 |
|
| 0x100C: | 0x1008: | 0x1004: lbzu 4, 1(3) |
|
| 0x100C: | 0x1008: | stall (load might trap) | 0x1004: lbzu 4, 1(3)
|
| 0x1010: | 0x100C: | 0x1008: cmplwi 4, 0 |
|
| 0x1014: | 0x1010: | 0x100C: beq 0, end | (branch not taken)
|
| 0x1004: | 0x1014: | 0x1010: cmplwi 4, 97 |
|
| 0x1008: | 0x1004: | 0x1014: bne 0, loop | (branch taken)
|
| 0x100C: | 0x1008: | 0x1004: lbzu 4, 1(3) | (waits for bne finishing)
|
| 0x100C: | 0x1008: | stall (load might trap) | 0x1004: lbzu 4, 1(3)
|
| 0x1010: | 0x100C: | 0x1008: cmplwi 4, 0 |
|
| 0x1014: | 0x1010: | 0x100C: beq 0, end | (branch not taken)
|
| 0x1004: | 0x1014: | 0x1010: cmplwi 4, 97 |
|
| 0x1008: | 0x1004: | 0x1014: bne 0, loop | (branch not taken)
|
| 0x1018: | flush | | (mispredicted)
|
| 0x101C: | 0x1018: | |
|
| 0x1020: | 0x101C: | 0x1018: li 4, 0 |
|
| -- | 0x1020: | 0x101C: stb 4, 0(3) |
|
| -- | 0x1020: | stall (store might trap) | 0x101C: stb 4, 0(3)
|
| -- | -- | 0x1020: blr |
|
+---------+---------+---------------------------+---------------------------+
for comparison:
executing with *no branch prediction* (loops once):
+---------+---------+---------------------------+---------------------------+
| fetch | issue | execute #1 | execute #2 / comment
|
+---------+---------+---------------------------+---------------------------+
| 0x1000: | | |
|
| 0x1004: | 0x1000: | |
|
| 0x1008: | 0x1004: | 0x1000: addi 3, 3, -1 |
|
| 0x100C: | 0x1008: | 0x1004: lbzu 4, 1(3) |
|
| 0x100C: | 0x1008: | stall (load might trap) | 0x1004: lbzu 4, 1(3)
|
| 0x1010: | 0x100C: | 0x1008: cmplwi 4, 0 |
|
| 0x1014: | 0x1010: | 0x100C: beq 0, end | (branch not taken)
|
| 0x1018: | 0x1014: | 0x1010: cmplwi 4, 97 |
|
| 0x101C: | 0x1018: | 0x1014: bne 0, loop | (branch taken)
|
| 0x1004: | flush | |
|
| 0x1008: | 0x1004: | |
|
| 0x100C: | 0x1008: | 0x1004: lbzu 4, 1(3) |
|
| 0x100C: | 0x1008: | stall (load might trap) | 0x1004: lbzu 4, 1(3)
|
| 0x1010: | 0x100C: | 0x1008: cmplwi 4, 0 |
|
| 0x1014: | 0x1010: | 0x100C: beq 0, end | (branch not taken)
|
| 0x1018: | 0x1014: | 0x1010: cmplwi 4, 97 |
|
| 0x101C: | 0x1018: | 0x1014: bne 0, loop | (branch not taken)
|
| 0x1020: | 0x101C: | 0x1018: li 4, 0 |
|
| -- | 0x1020: | 0x101C: stb 4, 0(3) |
|
| -- | 0x1020: | stall (store might trap) | 0x101C: stb 4, 0(3)
|
| -- | -- | 0x1020: blr |
|
| -- | flush | |
|
+---------+---------+---------------------------+---------------------------+
Jacob
More information about the Libre-soc-dev
mailing list