[Libre-soc-dev] [OpenPOWER-HDL-Cores] microwatt-libre-soc interoperable verilator snapshots / debugging
programmerjake at gmail.com
Sun Jan 9 15:28:41 GMT 2022
On Sun, Jan 9, 2022, 07:02 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> On Sun, Jan 9, 2022 at 2:21 PM Jacob Lifshay <programmerjake at gmail.com>
> > maybe VM save/restore? though that would miss TLB activity.
> yes. unless the TLB can be extracted [from qemu]
> > if you can get an instruction trace from qemu or similar, you could then
> postprocess it (e.g. with a relatively simple python script)
> we're 10 *million* instructions in to execution, only the first few
> lines of the linux kernel have been displayed up to that point. to
> get to the boot prompt could easily be 100 million instructions,
> possibly even a billion. whatever solution is envisaged has to take
> that into account.
any decent python script should be able to simulate >100k instructions per
second, if it can't run that fast, your using the wrong algorithms and/or
file formats. We could also write it in Rust or C/C++ if Python is still
It should basically boil down to:
for record in file.read_records():
if record.is_load or record.is_store:
output.write(record, itlb, dtlb)
> it did occur to me that if this is done properly in hardware, it would
> actually be possible to use the DMI interface to HALT microwatt when
> running on an FPGA, perform a full state-dump (including reading the
> full memory over DMI-wishbone), and then start it back up again *under
> this would be extremely cool because even a few thousand or tens of
> thousands of instructions under verilator is perfectly reasonable
> (even when VCD traces are enabled), and it would allow full
> signal-level debugging of FPGA execution just before it goes wrong.
> > to add the TLB info, allowing generating the state you want to load.
> alternatively you could just clear the TLB on state load, as long as our
> TLB automatically does page table walking, and then just ignore TLB state
> when comparing to microwatt or qemu.
> the issue is that if the TLB state is not captured, it is not possible
> to exactly have the exact same state. TLB misses will occur which
> will cause lookups to occur that would otherwise not occur, and that
> is not *exactly* the behaviour, at that exact time, that the
> [snapshotted] system was about to do.
My point was that, if TLB misses are handled entirely by a hw page table
walker, then they are transparent to software, therefore it doesn't matter
if one cpu has to run the hw walker and the other one has a TLB hit. All we
need is to properly ignore TLB and hw page table walker activity,
everything sw-visible will be identical (except maybe time and performance
> if there are bugs *in* the HDL of the TLB miss (or bugs in TLB hits),
> these will not be caught because the state was not exactly the same.
if those bugs don't affect sw-visible state, then you can still run a valid
comparison, because you're comparing sw-visible state, not
microarchitectural state. If the bugs do affect sw-visible state, then you
know where to look -- where the sw-visible state started having a mismatch,
and some of the previous TLB state changes (mostly wherever each TLB entry
was filled). If the TLB is too messed up, you can add a TLB consistency
check to the simulation and break when it becomes inconsistent.
More information about the Libre-soc-dev