[Libre-soc-sim] NLnet grant request, draft

Mon Aug 9 18:16:33 BST 2021

> Peter ah that's a good point: is there any "estimates" of power consumption possible through cavatools? L1 cache access costs X uW, L2 costs Y uW, FP mul costs Z uW and so on?

Counting resource usage is a reasonable way to estimate power.  Caveat (which isn’t connected to uspike yet) scoreboards instruction issue based on resource bitmap and register availability time.  I’m building a python tool to convert instruction group resource requirements to space-time bitmap.  The same tool can make counters for % utilization for power estimation (and decisions about whether that hardware is worth having…)

I’ve used resource counting for many years to estimate chip power in my designs.  I found that while not very accurate for short runs, it is quite good for trillions of instruction versus extrapolating from detailed power simulation of small benchmarks.

-Peter

> On Aug 9, 2021, at 7:02 PM, lkcl <luke.leighton at gmail.com> wrote:
> 
> 
>> On August 9, 2021 4:28:12 PM UTC, Peter Hsu <peter.hsu at bsc.es> wrote:
>> Hello Luke, All,
>> 
>> I have rewritten uspike.  It is on the 'uspike' branch of 
>> github.com/phaa-eu/cavatools if you are interested.
> 
> fantastic, yes, definitely.
> 
>> It now understand all instructions Spike does: standard GC, vector RVV,
>> 
>> and various in-progress extensions such as "B" Bit Manipulation.  The 
>> encodings and execution semantics are script-derived from riscv-opcodes
> 
> great to hear.  the talk by Vrull.eu on SPEC mark tests of xbitmanip had me concerned, they only showed 0.5 to 1% performance increase (they added xbitmanip to gcc btw). cavatools will definitely help narrow down what is going on.
> 
> we have a budget from NLnet to add bitmanip and cryptographic primitives to Power ISA, so if adding the same to Power ISA has similarly no performance increase, power consumption reduction or reduction in executable size i *really* want to know why.
> 
> Peter ah that's a good point: is there any "estimates" of power consumption possible through cavatools? L1 cache access costs X uW, L2 costs Y uW, FP mul costs Z uW and so on?
> 
> 
>> and riscv-isa-sim which you need to download from github.  I use the 
>> execution state directly from Spike's processor_t and state_t
>> declarations.
>> 
>> I have also connected uspike to riscv-unknown-linux-gnu-gdb through the
>> 
>> gdb command 'target remote hostname:portnumber' and running 'uspike 
>> --gdb=hostname:portnumber <riscv elf binary>.  However it is missing 
>> many things like setting breakpoint, etc. so is not operational.
> 
> it's a fantastic beginning, will save hugely on time for us on the Power ISA side.  to integrate with the unit tests we will need breakpoints, set and get of all regs, load and store to/from memory.
> 
>> The new uspike structure uses a Json file of instructions like this:
>> 
>> {
>>     "beq": {
>>         "type": "sb",
>>         "bits": "{-12|10:5} rs2 rs1 000 {4:1|11} 1100011",
>>         "ext": "",
>>         "flags": "pc",
>>         "exec": "if(RS1 == RS2)\n  set_pc(BRANCH_TARGET);\n"
>>     },
> 
> ok, in theory (python json module) that makes for easier parsing.
> 
> i have made a preliminary investigation of nmigen, to see if it is practical to use the nmigen HDL AST to output, in c, a Power ISA decoder.
> 
> the answer is yes, from a very unexpected direction: nmigen's python-based simulator, which, it turns out, through HDL AST tree-walking, actually constructs actual python expressions as text strings then exec()s them to create an in-memory function which is equivalent to the gate-level HDL.
> 
> the subset of python syntax used is so ridiculously small, with c and python being quite similar, that a version which outputs a tiny c subset should be a matter of a few days work.  (a more generic version quite a few more, but we won't need a generic version)
> 
> the result would be that nmigen HDL could be converted to c code, exactly like how verilator works.  if the HDL is simple enough (combinatorial, which the Power ISA decoder is) then it should be directly useable. readable, not so much.  useable, yes.
> 
>>     "bne": {
>>         "type": "sb",
>> 
>> The only code that needs to include Spike headers is the interpreter 
>> loop and gdblink; the rest of the system carries the CPU state as a 
>> void*.  I'm hoping this will make it easier to port a different ISA
>> from 
>> an existing simulator.
> 
> fantastic.
> 
>> There are still bugs in uspike--on a simple test (John Hennessy's 
>> Stanford Benchmark from a million years ago) it crashes after doing a 
>> bunch of system calls running 47 million instructions. 
> 
> "only" 47 million :)
> 
> btw i thought about the idea you asked about (dynamic library loading).  my original answer was, why not do full linux kernel support and it's not your problem: that was before i was aware you are doing a type of Virtual Machine.
> 
> i suspect that if sufficient system calls are implemented that dynamic library loading might actually occur without anything specific needing to be done.  this being supposition based on never actually having tried anything like that before.
> 
> l.