[Libre-soc-dev] microwatt booting linux-5.7 under verilator

Luke Kenneth Casson Leighton lkcl at lkcl.net
Mon Jan 3 00:45:42 GMT 2022


i am pleased to be able to announce the successful booting of microwatt-5.7
linux buildroot... under a veriilator simulation of the microwatt VHDL.
from a hardware development and research perspective this is highly
significant because unlike the FPGA boot which was previously reported,
https://shenki.github.io/boot-linux-on-microwatt/
full memory read/write snooping and full Signal tracing (gtkwave) is possible.

https://ftp.libre-soc.org/microwatt-linux-5.7-verilator-boot-buildroot.txt

the branch of microwatt HDL which is being used is here
https://git.libre-soc.org/?p=microwatt.git;a=shortlog;h=refs/heads/verilator_trace

some minor strategic changes to microwatt HDL were required, including
adding a new SYSCON parameter to specify a BRAM chain-boot address,
and also it was necessary to turn sdram_init into a stand-alone "mini-BIOS"
which performed the role of early-initialising the 16550 uart followed by
chain-loading to the BRAM chain-boot memory location, at which the linux
5.7 dtbImage.microwatt had been loaded (0x600000).

microwatt-verilator.cpp itself needed some changes to add support for
emulation in c++ of 512 mbyte of "Block" RAM.  the interface for BRAM
(aka SRAM) was far simpler than attempting to emulate DRAM, and
also meant that much of the mini-BIOS could be entirely cut.

i also had to  further modify microwatt-verilator.cpp to allow it to load
from files directly into memory, at run-time.  this means it is possible
to execute hello_world.bin, zephyr.bin, micropython.bin, dtbImage-microwatt
all without recompiling the verilator binary.

(not that you want to try compiling a 6 MB binary into VHDL like i did:
it resulted in the creation of a 512 MB verilog file which, at 60 GB resident
RAM by verilator attempting to compile that to c++, i decided that mayyybe
doing that at runtime was a better approach?)

i also had to fix a couple of things in the linux kernel source
https://git.kernel.org/pub/scm/linux/kernel/git/joel/microwatt.git

first attempts to boot a compressed image were quite hilarious: a
quick back-of-the-envelope calculation by examining the rate at which
LD/STs were being generated showed that the GZIP decompression
would complete maybe some time in about 1 hour of real-world time.
this led me to add support for CONFIG_KERNEL_UNCOMPRESSED
and cut that time entirely, hence why you can see this in the console log:

    0x5b0e10 bytes of uncompressed data copied

secondly, the microwatt Makefile assumes that verilator clock rate
runs at 50 mhz, where the microwatt.dts file says 100 mhz for both
the UART clock as well as the system clock.  it would be really nice
to have microwatt-linux read the SYSCON parameter for the
clock rate, and for that to be dynamically inserted into the dtb.
however in the interim, the attached patch suffices by manually
altering the clock in microwatt.dts to match that of the SYSCON
parameter.

the initial boots without sdram_init.bin did not go well.  this is
probably because the udbg0 (early ns16550.c) is not correctly
initialised (critically relying on the use of the microwatt console_init()
library). what was great - and this really is the whole point - i was
able to track down the source of the problem...
by examining the VCD trace wires of the 16550 Wishbone Bus
and internal UART registers... from the HDL! :)
if there had been such a problem on the FPGA side, that would
have been outright impossible and impractical.

for anyone thinking of following this and using it, please be under
no illusion: it took *two hours* to get to that boot prompt on a 4.8ghz
Intel i9.  1000 ns of "simulated" 50 mhz clock rate takes a stunning
15-20 seconds of real time.  you can do the math on the number
of instructions per second, there, but the huge advantage is: direct
snoop access to the memory, and the entire signal tracing of the
HDL - all of it: every single signal, for every single cycle.

the other downside: running for even 30 seconds produces an
astounding *10 gigabytes* of VCD trace log output.  normally
you would switch on command-line options in verilator to
only enable the VCD tracing at certain ranges of clock cycles,
so that you have access to the Signals that you are interested
in.  i have seen people enable that over a debug interface
(from a separate program, communicating with the verilator
executable) but that is outside the scope of this message.

the next task will be to swap out the microwatt core and drop in
the libresoc core.  with the successful passing of 17/19 of the
microwatt mmu.bin unit tests last week this is expected to be
relatively straightforward, especially given that we already have
microwatt-compatible XICS, microwatt-compatible DMI, exactly
the same sized I and D wishbone buses, and a direct port of
microwatt's MMU, L1 and D1 Caches.  missing is a SYSCON
device and the Wishbone Bus Arbiter.

however once that (relatively straightforward) work is done,
we will be able to boot the *exact* same linux buildroot image
(and i can debug it under verilator, which is why i've gone
to all the trouble, above....) and once that passes i will then
try an ECP5 FPGA boot.

hurrah.

l.
-------------- next part --------------
lkcl at fizzy:~/src/libresoc/microwatt$ ./microwatt-verilator /tmp/sdram_init.bin dtbImage.microwatt
loading /tmp/sdram_init.bin at 0x0 size 0x2680
loading dtbImage.microwatt at 0x600000 size 0x5d1018


Welcome to Microwatt !

 Soc signature: f00daa5500010001
  Soc features: UART BRAM 
          BRAM: 524288 KB
     BOOT ADDR: 0x600000
           CLK: 50 MHz

Booting from BRAM at 0x600000...

zImage starting: loaded at 0x0000000000600000 (sp: 0x0000000000bd3eb0)
No valid compressed data found, assume uncompressed data
Allocating 0x5fb320 bytes for kernel...
0x5b0e10 bytes of uncompressed data copied

Linux/PowerPC load: 
Finalizing device tree... flat tree at 0xbd4c80
[    0.000000] printk: bootconsole [udbg0] enabled
 -> early_setup(), dt_ptr: 0xbd4c80
[    0.000000] dt-cpu-ftrs: setup for ISA 3000
[    0.000000] dt-cpu-ftrs: final cpu/mmu features = 0x00000087800391e1 0x3c006041
[    0.000000] radix-mmu: Page sizes from device-tree:
[    0.000000] radix-mmu: Page size shift = 12 AP=0x0
[    0.000000] radix-mmu: Page size shift = 16 AP=0x5
[    0.000000] radix-mmu: Page size shift = 21 AP=0x1
[    0.000000] radix-mmu: Page size shift = 30 AP=0x2
[    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000600000 with 2.00 MiB pages (exec)
[    0.000000] radix-mmu: Mapped 0x0000000000600000-0x0000000010000000 with 2.00 MiB pages
 <- early_setup()
[    0.000000] Linux version 5.7.0-00030-gabe0e1dab0a2-dirty (lkcl at fizzy) (gcc version 9.3.0 (Debian 9.3.0-13), GNU ld (GNU Binutils for Debian) 2.35.1) #7 Sun Jan 2 19:32:23 GMT 2022
[    0.000000] Using microwatt machine description
[    0.000000] Found legacy serial port 0 for /soc at c0000000/serial at 2000
[    0.000000]   mem=c0002000, taddr=c0002000, irq=0, clk=50000000, speed=115200
[    0.000000] ioremap() called early from find_legacy_serial_ports+0x164/0x4bc. Use early_ioremap() instead
[    0.000000] -----------------------------------------------------
[    0.000000] phys_mem_size     = 0x10000000
[    0.000000] dcache_bsize      = 0x40
[    0.000000] icache_bsize      = 0x40
[    0.000000] cpu_features      = 0x00000087800391e1
[    0.000000]   possible        = 0x0003fbefcb5fb1a5
[    0.000000]   always          = 0x00000003800081a1
[    0.000000] cpu_user_features = 0xc4002102 0x88800000
[    0.000000] mmu_features      = 0x3c006041
[    0.000000] firmware_features = 0x0000000000000000
[    0.000000] vmalloc start     = 0xc008000000000000
[    0.000000] IO start          = 0xc00a000000000000
[    0.000000] vmemmap start     = 0xc00c000000000000
[    0.000000] -----------------------------------------------------
[    0.000000] barrier-nospec: using ORI speculation barrier
[    0.000000] barrier-nospec: patched 159 locations
[    0.000000] Top of RAM: 0x10000000, Total RAM: 0x10000000
[    0.000000] Memory hole size: 0MB
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] On node 0 totalpages: 65536
[    0.000000]   Normal zone: 896 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 65536 pages, LIFO batch:15
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 64640
[    0.000000] Kernel command line: 
[    0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 234844K/262144K available (3320K kernel code, 304K rwdata, 876K rodata, 1324K init, 296K bss, 27300K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 16
[    0.000000] ICS native initialized for sources 16..31
[    0.000000] ICS native backend registered
[    0.000000] random: get_random_u64 called from start_kernel+0x3f8/0x5ec with crng_init=0
[    0.000000] time_init: decrementer frequency = 50.000000 MHz
[    0.000000] time_init: processor frequency   = 50.000000 MHz
[    0.000220] time_init: 64 bit decrementer (max: 7fffffffffffffff)
[    0.006470] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
[    0.016821] clocksource: timebase mult[14000000] shift[24] registered
[    0.023454] clockevent: decrementer mult[cccccd] shift[28] cpu[0]
[    0.030138] pid_max: default: 4096 minimum: 301
[    0.037478] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.044867] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.076589] devtmpfs: initialized
[    0.106206] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.116352] futex hash table entries: 16 (order: -4, 384 bytes, linear)
[    0.127005] NET: Registered protocol family 16
[    0.229049] clocksource: Switched to clocksource timebase
[    0.254082] NET: Registered protocol family 2
[    0.273221] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.282903] TCP established hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.291612] TCP bind hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.299757] TCP: Hash tables configured (established 2048 bind 2048)
[    0.307120] UDP hash table entries: 128 (order: 0, 4096 bytes, linear)
[    0.314109] UDP-Lite hash table entries: 128 (order: 0, 4096 bytes, linear)
[    0.323472] NET: Registered protocol family 1
[    1.482343] workingset: timestamp_bits=62 max_order=16 bucket_order=0
[    1.703718] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
[    1.711361] io scheduler mq-deadline registered
[    2.519652] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    2.546014] printk: console [ttyS0] disabled
START_BIT error 434 306
                       [    2.571655] serial8250.0: ttyS0 at MMIO 0xc0002000 (irq = 16, base_baud = 3125000) is a 16550A
[    0.000000] printk: bootconsole [udbg0] enabled
[    0.000000] dt-cpu-ftrs: setup for ISA 3000
[    0.000000] dt-cpu-ftrs: final cpu/mmu features = 0x00000087800391e1 0x3c006041
[    0.000000] radix-mmu: Page sizes from device-tree:
[    0.000000] radix-mmu: Page size shift = 12 AP=0x0
[    0.000000] radix-mmu: Page size shift = 16 AP=0x5
[    0.000000] radix-mmu: Page size shift = 21 AP=0x1
[    0.000000] radix-mmu: Page size shift = 30 AP=0x2
[    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000600000 with 2.00 MiB pages (exec)
[    0.000000] radix-mmu: Mapped 0x0000000000600000-0x0000000010000000 with 2.00 MiB pages
[    0.000000] Linux version 5.7.0-00030-gabe0e1dab0a2-dirty (lkcl at fizzy) (gcc version 9.3.0 (Debian 9.3.0-13), GNU ld (GNU Binutils for Debian) 2.35.1) #7 Sun Jan 2 19:32:23 GMT 2022
[    0.000000] Using microwatt machine description
[    0.000000] Found legacy serial port 0 for /soc at c0000000/serial at 2000
[    0.000000]   mem=c0002000, taddr=c0002000, irq=0, clk=50000000, speed=115200
[    0.000000] ioremap() called early from find_legacy_serial_ports+0x164/0x4bc. Use early_ioremap() instead
[    0.000000] -----------------------------------------------------
[    0.000000] phys_mem_size     = 0x10000000
[    0.000000] dcache_bsize      = 0x40
[    0.000000] icache_bsize      = 0x40
[    0.000000] cpu_features      = 0x00000087800391e1
[    0.000000]   possible        = 0x0003fbefcb5fb1a5
[    0.000000]   always          = 0x00000003800081a1
[    0.000000] cpu_user_features = 0xc4002102 0x88800000
[    0.000000] mmu_features      = 0x3c006041
[    0.000000] firmware_features = 0x0000000000000000
[    0.000000] vmalloc start     = 0xc008000000000000
[    0.000000] IO start          = 0xc00a000000000000
[    0.000000] vmemmap start     = 0xc00c000000000000
[    0.000000] -----------------------------------------------------
[    0.000000] barrier-nospec: using ORI speculation barrier
[    0.000000] barrier-nospec: patched 159 locations
[    0.000000] Top of RAM: 0x10000000, Total RAM: 0x10000000
[    0.000000] Memory hole size: 0MB
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] On node 0 totalpages: 65536
[    0.000000]   Normal zone: 896 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 65536 pages, LIFO batch:15
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 64640
[    0.000000] Kernel command line: 
[    0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 234844K/262144K available (3320K kernel code, 304K rwdata, 876K rodata, 1324K init, 296K bss, 27300K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 16
[    0.000000] ICS native initialized for sources 16..31
[    0.000000] ICS native backend registered
[    0.000000] random: get_random_u64 called from start_kernel+0x3f8/0x5ec with crng_init=0
[    0.000000] time_init: decrementer frequency = 50.000000 MHz
[    0.000000] time_init: processor frequency   = 50.000000 MHz
[    0.000220] time_init: 64 bit decrementer (max: 7fffffffffffffff)
[    0.006470] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
[    0.016821] clocksource: timebase mult[14000000] shift[24] registered
[    0.023454] clockevent: decrementer mult[cccccd] shift[28] cpu[0]
[    0.030138] pid_max: default: 4096 minimum: 301
[    0.037478] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.044867] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.076589] devtmpfs: initialized
[    0.106206] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.116352] futex hash table entries: 16 (order: -4, 384 bytes, linear)
[    0.127005] NET: Registered protocol family 16
[    0.229049] clocksource: Switched to clocksource timebase
[    0.254082] NET: Registered protocol family 2
[    0.273221] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.282903] TCP established hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.291612] TCP bind hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.299757] TCP: Hash tables configured (established 2048 bind 2048)
[    0.307120] UDP hash table entries: 128 (order: 0, 4096 bytes, linear)
[    0.314109] UDP-Lite hash table entries: 128 (order: 0, 4096 bytes, linear)
[    0.323472] NET: Registered protocol family 1
[    1.482343] workingset: timestamp_bits=62 max_order=16 bucket_order=0
[    1.703718] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
[    1.711361] io scheduler mq-deadline registered
[    2.519652] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    2.546014] printk: console [ttyS0] disabled
[    2.571655] serial8250.0: ttyS0 at MMIO 0xc0002000 (irq = 16, base_baud = 3125000) is a 16550A
[    3.088015] printk: console [ttyS0] enabled
[    3.088015] printk: console [ttyS0] enabled
[    3.110477] printk: console [ttyS0] disabled
[    3.110477] printk: console [ttyS0] disabled
[    3.120049] c0002000.serial: ttyS0 at MMIO 0xc0002000 (irq = 16, base_baud = 3125000) is a 16550
[    3.129326] printk: console [ttyS0] enabled
[    3.129326] printk: console [ttyS0] enabled
[    3.137748] printk: bootconsole [udbg0] disabled
[    3.137748] printk: bootconsole [udbg0] disabled
[    3.313685] brd: module loaded
[    3.421362] loop: module loaded
[    3.443329] libphy: Fixed MDIO Bus: probed
[    3.460423] c8021000.ethernet eth0: irq 17, mapped at c00a000080009000
[    3.491923] NET: Registered protocol family 10
[    3.524644] Segment Routing with IPv6
[    3.531020] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    3.552795] NET: Registered protocol family 17
[    3.557627] drmem: No dynamic reconfiguration memory found
[    3.583843] Freeing unused kernel memory: 1324K
[    3.588523] This architecture does not have kernel memory protection.
[    3.595279] Run /init as init process
[    3.599240]   with arguments:
[    3.602337]     /init
[    3.604725]   with environment:
[    3.607986]     HOME=/
[    3.610652]     TERM=linux
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Saving random seed: [    5.377122] random: dd: uninitialized urandom read (512 bytes read)
OK
Starting network: OK

Welcome to Buildroot
buildroot login: root
# ls /
bin      init     linuxrc  opt      run      tmp
dev      lib      media    proc     sbin     usr
etc      lib64    mnt      root     sys      var
# ls /bin
arch           dnsdomainname  ln             ping           stty
ash            dumpkmap       login          pipe_progress  su
base64         echo           ls             printenv       sync
busybox        egrep          lsattr         ps             tar
cat            false          mkdir          pwd            touch
chattr         fdflush        mknod          resume         true
chgrp          fgrep          mktemp         rm             umount
chmod          getopt         more           rmdir          uname
chown          grep           mount          run-parts      usleep
coremark       gunzip         mountpoint     sed            vi
cp             gzip           mt             setarch        watch
cpio           hostname       mv             setpriv        zcat
date           kill           netstat        setserial
dd             link           nice           sh
df             linux32        nuke           simple_random
dmesg          linux64        pidof          sleep
# whoam[    7.501594] random: fast init done
i
root
# 



More information about the Libre-soc-dev mailing list