[Libre-soc-dev] progress on Arty A7-100t using symbiflow to compile microwatt and libre-soc

Luke Kenneth Casson Leighton lkcl at lkcl.net
Tue Feb 8 16:27:20 GMT 2022


i wanted to update everyone on the progress of using symbiflow for the
Digilent Arty A7-100t.  this will help the IBM/OpenPOWER Educational
Course where the University of Oregon is currently experimenting with
Crowdsupply-backed LiteFury A7-100t M.2 form-factor FPGA boards,
and getting a Libre-Licensed toolchain in proper shape i would consider
a priority for many reasons.

https://github.com/RHSResearchLLC/NiteFury-and-LiteFury
https://www.crowdsupply.com/rhs-research/nitefury

the version of symbiflow as shipped has two major show-stopper flaws:

1) vtr segfaults on large designs. (it works perfectly on small experiments)

2) CARRY4 chains above around 23-25 will simply not route.  in practice
    this eliminates any design that has an add, subtract, or compare operation
    requiring above 96-100 bits in length.

    [for example a 64-bit DIV unit will fail to route because it requires a
     quotient/remainder of 128-bit]

i have workarounds for both of these issues (thanks to acomodi on Libera
IRC #symbiflow for prompting some investigation) and in rough note form
it is documented here:

https://libre-soc.org/irclog-microwatt/%23microwatt.2022-02-08.log.html

* the workaround for (1): simply update to latest master of
  vtr-verilog-to-routing: "commit d15ed677472" is confirmed functional

* the workaround for (2): in synth.tcl add the option "-nocarry" to all 4
   occurrences of "synth_xilinx".  only the first two are likely crucial but
   better safe than sorry. the result is that many more LUT4/5/6s are used
   but at least they route.

  you can hand-edit this file after installation:
  /usr/local/symbiflow/share/symbiflow/scripts/xc7/synth.tcl


we currently have automated build scripts that do *not* use conda,
thanks to Veera for writing them.
https://git.libre-soc.org/?p=dev-env-setup.git;a=blob;f=symbiflow-install;hb=HEAD

you can see there is the option within that script to build a parallel and
non-parallel build variant: parallel builds of any kind on my laptop with
64 GB of RAM eat so many resources it's dangerous, hence the option.
on a server it would be fine, hence _that_ option.

as explained in the IRC chat log link, i have found that it's perfectly
fine to use the Libre-SOC schroot build script followed by then
copying the resultant binaries and database files *out* of the
chroot and into a main (non-chroot) system.

also as explained in the IRC chat, it appears that symbiflow uses
a vanilla upstream commit of yosys (commit f44110c62) so there are
no patches preventing a later version (or a globally-installed version
that already has the ghdl plugin) from being installed / used.  it is a minor
pain to have to build / copy / install so many yosys plugins but it
can be done somewhat in a trance on autopilot.

actual results
----------------

after getting through the compilation and successfully creating
arty.bit files, the results are... mixed.

* microwatt successfully shows up to the CRC and then hangs.
  this is at 50 mhz.  i am currently running a rebuild at 25mhz
  to see if that helps.

* libre-soc does not display anything at all [note: both microwatt
  and libre-soc are confirmed functional on the VERSA_ECP5
  FPGA using the exact same verilog source for both]

inspecting the timing reports shows a massive setup skew
of "-55" against a "required' timing of 0.08

given that the VERSA_ECP5 works perfectly with exactly the same
source, we might reasonably conclude that some considerable
investigation and improvement is needed to symbiflow. as i said earlier:
smaller designs using a fraction of the resources of the A7-100t are
perfectly fine (Blinky) however both the Libre-SOC and Microwatt
designs are pushing 60-75% utilisation and that's where it looks
like things start to fall over.

the other path worth investigating is nextpnr-xilinx however as
set up by the developer it requires installation of PrjXray which in
turn requires the proprietary Xilinx tools and requires reverse-engineering
to be performed (automatically).  one possibility there is to use
the symbiflow prjxray-db pre-discovered resources but nextpnr-xilinx
is not set up to use that, out-of-the-box.

another warning about vtr: compilation resources needed are
massive.  vpr is currently using 35 gigabytes of resident RAM,
and xcfasm yesterday required 40 GB. OOM killer kicks in regularly
even on a laptop with 64 GB of RAM.  if you have 64 GB RAM
i recommend at least 1.5x that in swap.

nextpnr-ecp5 on the other hand is quite reasonable: i have only
ever once seen yosys try to eat 20 GB of RAM and that was down
to a known bug, since fixed.

bottom line here is that the 85k LUT4 ECP5s are a much more
stable bet, but annoyingly they are hard to get hold of at the moment
(VERSA_ECP5 is only 45k LUT4s).

l.

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68



More information about the Libre-soc-dev mailing list