[Libre-soc-bugs] [Bug 726] New: Additional core_stop check after Execute breaks single-stepping

Tue Oct 12 22:34:51 BST 2021

https://bugs.libre-soc.org/show_bug.cgi?id=726

            Bug ID: 726
           Summary: Additional core_stop check after Execute breaks
                    single-stepping
           Product: Libre-SOC's first SoC
           Version: unspecified
          Hardware: PC
               URL: https://libre-soc.org/irclog/latest.log.html#t2021-10-
                    12T18:33:53
                OS: Linux
            Status: CONFIRMED
          Severity: major
          Priority: High
         Component: Source Code
          Assignee: cestrauss at gmail.com
          Reporter: cestrauss at gmail.com
                CC: libre-soc-bugs at lists.libre-soc.org
   NLnet milestone: ---

Executing:

1) python ~/src/soc/src/soc/simple/issuer_verilog.py --disable-svp64
--debug=dmi ~/src/soc/src/soc/litex/florent/libresoc/libresoc.v

2) python ~/src/soc/src/soc/litex/florent/sim.py --debug --variant=standard

... simulates the libre-soc core, with an embedded FSM single-stepping it, 
controlled by DMI.

Right now, one every two DMI single-step commands is not actually executing,
deterministically.

Since we may want to stop the core in the middle of a VL loop, I have put
another core stop check after Execute. Together with the check before Fetch,
that's two core stop checks in a row.

What I didn't anticipate was core_stop being pulsed low, for single-step. As
core_stop immediately goes high, the second check before Fetch catches it, and
doesn't resume execution.

Unfortunately it seems likely that this bug ended up on the chip. The
additional core_stop check after Execute was not conditional on --svp64.

These are the tasks as I see it:

1) Make a test-case that catches this regression
2) Fix the FSM to avoid the issue
3) Document the present behavior of the test chip
4) Develop and test mitigations for testing the chip

In principle, running two DMI single step commands in a row should work around
this problem on the chip.

A side effect is that, after randomly stopping the core, the PC read by DMI may
or may not point to the next instruction, depending whether the last executed
instruction updated the PC, and it stopped on the check after Execution.

Too bad about the chip. Let's hope the workaround actually works in practice,
and doesn't impact testing by much. Sorry about this.

-- 
You are receiving this mail because:
You are on the CC list for the bug.