[Libre-soc-dev] progress on Arty A7-100t using symbiflow to compile microwatt and libre-soc
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Tue Feb 8 16:27:20 GMT 2022
i wanted to update everyone on the progress of using symbiflow for the
Digilent Arty A7-100t. this will help the IBM/OpenPOWER Educational
Course where the University of Oregon is currently experimenting with
Crowdsupply-backed LiteFury A7-100t M.2 form-factor FPGA boards,
and getting a Libre-Licensed toolchain in proper shape i would consider
a priority for many reasons.
https://github.com/RHSResearchLLC/NiteFury-and-LiteFury
https://www.crowdsupply.com/rhs-research/nitefury
the version of symbiflow as shipped has two major show-stopper flaws:
1) vtr segfaults on large designs. (it works perfectly on small experiments)
2) CARRY4 chains above around 23-25 will simply not route. in practice
this eliminates any design that has an add, subtract, or compare operation
requiring above 96-100 bits in length.
[for example a 64-bit DIV unit will fail to route because it requires a
quotient/remainder of 128-bit]
i have workarounds for both of these issues (thanks to acomodi on Libera
IRC #symbiflow for prompting some investigation) and in rough note form
it is documented here:
https://libre-soc.org/irclog-microwatt/%23microwatt.2022-02-08.log.html
* the workaround for (1): simply update to latest master of
vtr-verilog-to-routing: "commit d15ed677472" is confirmed functional
* the workaround for (2): in synth.tcl add the option "-nocarry" to all 4
occurrences of "synth_xilinx". only the first two are likely crucial but
better safe than sorry. the result is that many more LUT4/5/6s are used
but at least they route.
you can hand-edit this file after installation:
/usr/local/symbiflow/share/symbiflow/scripts/xc7/synth.tcl
we currently have automated build scripts that do *not* use conda,
thanks to Veera for writing them.
https://git.libre-soc.org/?p=dev-env-setup.git;a=blob;f=symbiflow-install;hb=HEAD
you can see there is the option within that script to build a parallel and
non-parallel build variant: parallel builds of any kind on my laptop with
64 GB of RAM eat so many resources it's dangerous, hence the option.
on a server it would be fine, hence _that_ option.
as explained in the IRC chat log link, i have found that it's perfectly
fine to use the Libre-SOC schroot build script followed by then
copying the resultant binaries and database files *out* of the
chroot and into a main (non-chroot) system.
also as explained in the IRC chat, it appears that symbiflow uses
a vanilla upstream commit of yosys (commit f44110c62) so there are
no patches preventing a later version (or a globally-installed version
that already has the ghdl plugin) from being installed / used. it is a minor
pain to have to build / copy / install so many yosys plugins but it
can be done somewhat in a trance on autopilot.
actual results
----------------
after getting through the compilation and successfully creating
arty.bit files, the results are... mixed.
* microwatt successfully shows up to the CRC and then hangs.
this is at 50 mhz. i am currently running a rebuild at 25mhz
to see if that helps.
* libre-soc does not display anything at all [note: both microwatt
and libre-soc are confirmed functional on the VERSA_ECP5
FPGA using the exact same verilog source for both]
inspecting the timing reports shows a massive setup skew
of "-55" against a "required' timing of 0.08
given that the VERSA_ECP5 works perfectly with exactly the same
source, we might reasonably conclude that some considerable
investigation and improvement is needed to symbiflow. as i said earlier:
smaller designs using a fraction of the resources of the A7-100t are
perfectly fine (Blinky) however both the Libre-SOC and Microwatt
designs are pushing 60-75% utilisation and that's where it looks
like things start to fall over.
the other path worth investigating is nextpnr-xilinx however as
set up by the developer it requires installation of PrjXray which in
turn requires the proprietary Xilinx tools and requires reverse-engineering
to be performed (automatically). one possibility there is to use
the symbiflow prjxray-db pre-discovered resources but nextpnr-xilinx
is not set up to use that, out-of-the-box.
another warning about vtr: compilation resources needed are
massive. vpr is currently using 35 gigabytes of resident RAM,
and xcfasm yesterday required 40 GB. OOM killer kicks in regularly
even on a laptop with 64 GB of RAM. if you have 64 GB RAM
i recommend at least 1.5x that in swap.
nextpnr-ecp5 on the other hand is quite reasonable: i have only
ever once seen yosys try to eat 20 GB of RAM and that was down
to a known bug, since fixed.
bottom line here is that the 85k LUT4 ECP5s are a much more
stable bet, but annoyingly they are hard to get hold of at the moment
(VERSA_ECP5 is only 45k LUT4s).
l.
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the Libre-soc-dev
mailing list