[Libre-soc-bugs] [Bug 1039] add hardware-cycle-accurate stastistical modelling to ISACaller for an in-order core

Mon Aug 21 17:44:16 BST 2023

https://bugs.libre-soc.org/show_bug.cgi?id=1039

--- Comment #22 from Andrey Miroshnikov <andrey at technepisteme.xyz> ---
Made several changes which now allow the code to run the test case (not yet
correct, but at least the code actually runs).

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ed7797d8288cc9701d070a82884ebe74816b0232
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ed7797d8288cc9701d070a82884ebe74816b0232

The results are wrong, and stall isn't occurring, but that's because I
intentionally only pushed the necessary changes to get the code to run.

What is the issue stage really meant to be?
On online examples of a pipelined processor, there's usually only the decode
stage, which performs both decoding and issuing.
In your code comments Luke, you mentioned that the register read/writes should
really be moved to the issue pipeline.

I added a change to the code to ensure the initial fetch stage[0] instruction
isn't wiped by the tick() before it is passed to the decode object (1 clock
later). Similar changes were also made to decode, issue, and execute.

The decode, issue, and execute pipelines also don't run if there's nothing in
their corresponding stage[0] entries (checks for either None or zero-length for
execute).

My understanding was that an instruction must at least remain at each stage for
one clock cycle (fetch -> decode -> issue -> execute), which means there's a
4-cycle latency before any results get out (assuming all insn execute in one
cycle for now).

The code however, allows the instruction to go straight from decode to execute
within the same cycle. I have noticed this, but haven't fixed it yet,
because the execute object's 'add_stage' method seems to add the extra 2 cycle
delay.

You can run the code with -t option to go through the 8 instructions in the
unit
test. Obviously doesn't properly work yet, but at least committing what I've
done so far. The output is a table (which will be converted to markdown later),
showing the path of the instruction over time (in clock cycles).

Here's the table for the test case (so far incorrect, no stalls, etc.):

| clk # | fetch | decode | issue | exec |
| 0 | addi 1, 0, 0x0010|       |      |     |
| 1 | addi 2, 0, 0x1234| addi 1, 0, 0x0010| addi 1, 0, 0x0010|     |
| 2 | stw 2, 0(1)| addi 2, 0, 0x1234| addi 2, 0, 0x1234|     |
| 3 | lwz 3, 0(1)| stw 2, 0(1)| stw 2, 0(1)| addi 1, 0, 0x0010|
| 4 | add 1, 3, 2| lwz 3, 0(1)| lwz 3, 0(1)| addi 2, 0, 0x1234|
| 5 | addi 3, 0, 0x1234| add 1, 3, 2| add 1, 3, 2| stw 2, 0(1)|
| 6 | addi 2, 0, 0x4321| addi 3, 0, 0x1234| addi 3, 0, 0x1234| lwz 3, 0(1)|
| 7 | add  1, 3, 2| addi 2, 0, 0x4321| addi 2, 0, 0x4321| add 1, 3, 2|

-- 
You are receiving this mail because:
You are on the CC list for the bug.