[Libre-soc-dev] gigabit router design decisions

Jacob Lifshay programmerjake at gmail.com
Thu Nov 4 21:30:27 GMT 2021


On Thu, Nov 4, 2021 at 3:31 AM Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:
>
> On Thu, Nov 4, 2021 at 1:39 AM Jacob Lifshay <programmerjake at gmail.com>
wrote:
>
> > branch prediction doesn't require speculative execution:
>
> of course it does!  you have to fetch the instructions ahead, and
> you have to execute the instructions ahead... then cancel both.
>
> that implies that cancellation infrastructure has to be added right
> the way through the entire design.

It doesn't matter much anymore, but I'll explain again anyway:

Branch prediction doesn't require
speculative execution because you can build a processor that will fetch
ahead but *not* execute ahead:

e.g.:
for the following loop:
https://rust.godbolt.org/z/nzT4P1vW1
(edited slightly, assigning addresses)
f:
0x1000:  addi 3, 3, -1
loop:
0x1004:  lbzu 4, 1(3)
0x1008:  cmplwi 4, 0
0x100C:  beq 0, end
0x1010:  cmplwi 4, 97
0x1014:  bne 0, loop
end:
0x1018:  li 4, 0
0x101C:  stb 4, 0(3)
0x1020:  blr

fetching *with branch prediction*, but *no speculative execution*,
branch back to loop is predicted taken (loops once):
(pay close attention to how the lbzu *isn't* started executing
until the bne finished, but is fetched ahead of time -- sorry,
couldn't come up with a better example since branches take just 1 cycle)
+---------+---------+---------------------------+---------------------------+
| fetch   | issue   | execute #1                | execute #2 / comment
 |
+---------+---------+---------------------------+---------------------------+
| 0x1000: |         |                           |
|
| 0x1004: | 0x1000: |                           |
|
| 0x1008: | 0x1004: | 0x1000:  addi 3, 3, -1    |
|
| 0x100C: | 0x1008: | 0x1004:  lbzu 4, 1(3)     |
|
| 0x100C: | 0x1008: | stall (load might trap)   | 0x1004:  lbzu 4, 1(3)
|
| 0x1010: | 0x100C: | 0x1008:  cmplwi 4, 0      |
|
| 0x1014: | 0x1010: | 0x100C:  beq 0, end       | (branch not taken)
 |
| 0x1004: | 0x1014: | 0x1010:  cmplwi 4, 97     |
|
| 0x1008: | 0x1004: | 0x1014:  bne 0, loop      | (branch taken)
 |
| 0x100C: | 0x1008: | 0x1004:  lbzu 4, 1(3)     | (waits for bne finishing)
|
| 0x100C: | 0x1008: | stall (load might trap)   | 0x1004:  lbzu 4, 1(3)
|
| 0x1010: | 0x100C: | 0x1008:  cmplwi 4, 0      |
|
| 0x1014: | 0x1010: | 0x100C:  beq 0, end       | (branch not taken)
 |
| 0x1004: | 0x1014: | 0x1010:  cmplwi 4, 97     |
|
| 0x1008: | 0x1004: | 0x1014:  bne 0, loop      | (branch not taken)
 |
| 0x1018: | flush   |                           | (mispredicted)
 |
| 0x101C: | 0x1018: |                           |
|
| 0x1020: | 0x101C: | 0x1018:  li 4, 0          |
|
| --      | 0x1020: | 0x101C:  stb 4, 0(3)      |
|
| --      | 0x1020: | stall (store might trap)  | 0x101C:  stb 4, 0(3)
 |
| --      | --      | 0x1020:  blr              |
|
+---------+---------+---------------------------+---------------------------+

for comparison:
executing with *no branch prediction* (loops once):
+---------+---------+---------------------------+---------------------------+
| fetch   | issue   | execute #1                | execute #2 / comment
 |
+---------+---------+---------------------------+---------------------------+
| 0x1000: |         |                           |
|
| 0x1004: | 0x1000: |                           |
|
| 0x1008: | 0x1004: | 0x1000:  addi 3, 3, -1    |
|
| 0x100C: | 0x1008: | 0x1004:  lbzu 4, 1(3)     |
|
| 0x100C: | 0x1008: | stall (load might trap)   | 0x1004:  lbzu 4, 1(3)
|
| 0x1010: | 0x100C: | 0x1008:  cmplwi 4, 0      |
|
| 0x1014: | 0x1010: | 0x100C:  beq 0, end       | (branch not taken)
 |
| 0x1018: | 0x1014: | 0x1010:  cmplwi 4, 97     |
|
| 0x101C: | 0x1018: | 0x1014:  bne 0, loop      | (branch taken)
 |
| 0x1004: | flush   |                           |
|
| 0x1008: | 0x1004: |                           |
|
| 0x100C: | 0x1008: | 0x1004:  lbzu 4, 1(3)     |
|
| 0x100C: | 0x1008: | stall (load might trap)   | 0x1004:  lbzu 4, 1(3)
|
| 0x1010: | 0x100C: | 0x1008:  cmplwi 4, 0      |
|
| 0x1014: | 0x1010: | 0x100C:  beq 0, end       | (branch not taken)
 |
| 0x1018: | 0x1014: | 0x1010:  cmplwi 4, 97     |
|
| 0x101C: | 0x1018: | 0x1014:  bne 0, loop      | (branch not taken)
 |
| 0x1020: | 0x101C: | 0x1018:  li 4, 0          |
|
| --      | 0x1020: | 0x101C:  stb 4, 0(3)      |
|
| --      | 0x1020: | stall (store might trap)  | 0x101C:  stb 4, 0(3)
 |
| --      | --      | 0x1020:  blr              |
|
| --      | flush   |                           |
|
+---------+---------+---------------------------+---------------------------+

Jacob


More information about the Libre-soc-dev mailing list