[Libre-soc-dev] [OpenPOWER-HDL-Cores] microwatt-libre-soc interoperable verilator snapshots / debugging

Sun Jan 9 15:28:41 GMT 2022

On Sun, Jan 9, 2022, 07:02 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Sun, Jan 9, 2022 at 2:21 PM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> > maybe VM save/restore? though that would miss TLB activity.
>
> yes.  unless the TLB can be extracted [from qemu]

> > if you can get an instruction trace from qemu or similar, you could then
> postprocess it (e.g. with a relatively simple python script)
>
> we're 10 *million* instructions in to execution, only the first few
> lines of the linux kernel have been displayed up to that point.  to
> get to the boot prompt could easily be 100 million instructions,
> possibly even a billion.  whatever solution is envisaged has to take
> that into account.
>

any decent python script should be able to simulate >100k instructions per
second, if it can't run that fast, your using the wrong algorithms and/or
file formats. We could also write it in Rust or C/C++ if Python is still
too slow.

It should basically boil down to:
for record in file.read_records():
    itlb.access(record.pc)
    if record.is_load or record.is_store:
        dtlb.access(record.data_address)
    if record.is_tlb_flush:
        itlb.clear()
        dtlb.clear()
    output.write(record, itlb, dtlb)

>
> it did occur to me that if this is done properly in hardware, it would
> actually be possible to use the DMI interface to HALT microwatt when
> running on an FPGA, perform a full state-dump (including reading the
> full memory over DMI-wishbone), and then start it back up again *under
> verilator*.
>
> this would be extremely cool because even a few thousand or tens of
> thousands of instructions under verilator is perfectly reasonable
> (even when VCD traces are enabled), and it would allow full
> signal-level debugging of FPGA execution just before it goes wrong.
>
> > to add the TLB info, allowing generating the state you want to load.
> alternatively you could just clear the TLB on state load, as long as our
> TLB automatically does page table walking, and then just ignore TLB state
> when comparing to microwatt or qemu.
>
> the issue is that if the TLB state is not captured, it is not possible
> to exactly have the exact same state.  TLB misses will occur which
> will cause lookups to occur that would otherwise not occur, and that
> is not *exactly* the behaviour, at that exact time, that the
> [snapshotted] system was about to do.
>

My point was that, if TLB misses are handled entirely by a hw page table
walker, then they are transparent to software, therefore it doesn't matter
if one cpu has to run the hw walker and the other one has a TLB hit. All we
need is to properly ignore TLB and hw page table walker activity,
everything sw-visible will be identical (except maybe time and performance
counters).

>
> if there are bugs *in* the HDL of the TLB miss (or bugs in TLB hits),
> these will not be caught because the state was not exactly the same.
>

if those bugs don't affect sw-visible state, then you can still run a valid
comparison, because you're comparing sw-visible state, not
microarchitectural state. If the bugs do affect sw-visible state, then you
know where to look -- where the sw-visible state started having a mismatch,
and some of the previous TLB state changes (mostly wherever each TLB entry
was filled). If the TLB is too messed up, you can add a TLB consistency
check to the simulation and break when it becomes inconsistent.

Jacob