[Libre-soc-dev] pia as cycle accurate simulator?

Fri Oct 16 01:15:16 BST 2020

On Thu, Oct 15, 2020 at 10:24 AM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
> On Thu, Oct 15, 2020 at 6:10 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> > it's not that hard, just tedious. I could have a working simulator for the
> > instructions currently implemented up in 2-3 days.
>
> i'd forgotten Jacob that we've got both Alain and Lauri involved,
> here.  expertise:
>
> * you - c, c++, rust, python, assembler

Well, I started learning Rust just a few months before I started
working on Libre-SOC, so it wasn't that long. Also, I had basically
never used Python before joining Libre-SOC either, and I'd estimate
that I'm pretty good at Python now, so learning enough Rust or Python
to be useful doesn't take all that long, especially considering that
most of the harder-to-learn features of Rust aren't really used in an
ISA simulator, since you aren't really dealing with complex recursive
data structures or complex generic code -- more on that later.

> * me - c, c++, python, assembler - 25+ years
> * alain - c, assembler - 40+ years
> * lauri - c, c++, python(?), assembler

Not to be disrespectful, but, once you've programmed in a language
enough to be an expert, the additional years of programming don't make
you all that much better at that language in particular (except for
just keeping up with the changes in the language). Most of the new
things to learn are applicable to most other programming languages.

> basically you would become the sole exclusive critical dependency for
> the development and ongoing maintenance of a rust-based simulator.
> neither Alain, Lauri nor myself could usefully contribute just when
> it's critical to get things moving.

I think it would actually be much easier than you think to contribute,
since the subset of Rust that is needed for the majority of the work
(implementing more instructions) doesn't use most of the confusing or
hard-to-learn parts of rust. I've been designing the instruction
models in pia specifically to be written in a simple and
straight-forward way, even if it means repeating code.

For example, just look at the commit for adding the mulli instruction:
https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/commit/4d295f6d97c3a1b1adfe3f50d221bd4f0f26e68a

It adds the instruction model:

pub fn mulli(inputs: InstructionInput) -> InstructionResult {
    let ra = inputs.try_get_ra()? as i64;
    let immediate = inputs.try_get_immediate_s16()? as i64;
    let result = ra.wrapping_mul(immediate) as u64;
    Ok(InstructionOutput {
        rt: Some(result),
        ..InstructionOutput::default()
    })
}

and the section in the `instructions!` macro that generates the native
inline assembly as well as the Rust and Python glue code:

#[enumerant = MulLI]
fn mulli(Ra, ImmediateS16) -> (Rt) {
    "mulli"
}

The instruction model basically reads the inputs (ra and immediate),
computes the result (using wrapping_mul from the Rust standard library
for this particular instruction), then returns the output with rt set
and the rest left at their defaults (unset). You can easily copy-paste
from other instructions and modify the function body as needed.

The section in `instructions!` is custom syntax that basically telling
it the enumerant name it should use (MulLI here; it creates an enum of
all the instructions), the name used for the instruction model
function (the `fn mulli` part), the inputs and outputs for this
instruction (based on the Power spec.), and the instruction name it
should use for the generated inline assembly (the "mulli" string).

After adding the new instruction, all you have to do is run
./gen-output.sh on a POWER9 system (not qemu -- qemu gives different
results for some instructions) and look in the output*.json file it
made, find the first mention of `mismatch`, fix the bugs in your
implementation for that particular test case as shown in the json
file, then you good to go!

How hard was that?

The memory read/write instructions won't be much harder since all that
happens is there's an array of bytes passed into the functions, and
you just index it as needed -- just like C.
>
>
> > * ISAcaller - part of LibreSOC - "does the job", is cycle-accurate, in
> > > python is relatively slow, however has the advantage of being
> > > co-developed with the HDL.
> > >
> >
> > iirc it just executes everything in one cycle and doesn't keep track of
> > cycles, so, if that counts as cycle accurate, it would be trivial to add
> > that level of cycle accuracy to a pia-based simulator.
>
> cycle-accurate means that when you do one "tick" of the clock, the
> underlying simulator also does one "tick" such that when you do debug
> printouts of the registers... or that you *can* do debug printouts of
> the registers at all - it's clock-for-clock.  one "step" of the
> simulator IS one "step" of the underlying state.
>
> JIT simulators such as qemu utterly break and completely ignore that.

That's not actually accurate, IIRC qemu totally supports
single-stepping from it's GDB interface, so that totally qualifies in
my book. If qemu mushes a bunch of instructions together when you're
not telling qemu to single-step, I don't see why that would matter,
since, all that happens is it gives the exact same results (ignoring
stuff like data races), just quicker (hopefully).

I also heard something about qemu being able to produce instruction
traces -- I'd think that's exactly what you're looking for, though I
haven't checked.

At one point, about 5 years ago, I was daydreaming about building a
JIT ISA simulator that correctly models all of the pipeline timings
and other stuff needed for a complete matching-the-HDL simulator with
cycle-for-cycle level of cycle accuracy (which is what I generally
think of when I hear cycle-accurate). That simulator would keep track
of the current cycle in a counter, and, if not single-stepping, could
batch updates to the cycle counter and perform other optimizations
too.

Jacob