[Libre-soc-bugs] [Bug 934] Evaluation of optimized placement for Yosys generated SRAM

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Thu Sep 22 10:13:41 BST 2022


https://bugs.libre-soc.org/show_bug.cgi?id=934

--- Comment #1 from Jean-Paul Chaput <Jean-Paul.Chaput at lip6.fr> ---

Evaluation results could be rebuild with:

* coriolis commit #7d31d6c4
* alliance-check-toolkit commit #d389964d

This is a copy of the results given in the Cumulus plugin sramplacer2.py


Automatic placement of a Yosys generated SRAM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* We were expecting the output decoder to be the same for each bit
  line, allowing us to rebuild a matrix-like placement. This is not so.
  Each output mux equation is synthesized differently. Knowing that
  we did create a row-based placement, with reordering capabilities
  so we can optimize the mux placement.

* Alas, the previous effort was doomed from the start. If you have
  the same multiplexing function for all the bits, the command signals
  from the decoder are the same. For example, to mux 256 words,
  assuming we use only mux2, we need 8 bits (control lines).
  Given that we have also to take into account "ce", "we", "rst"
  and "oe", there are more of them, but not so much. Let's say 20.

    When running placeSRAM and looking at the last level (5) of the
  DAG's decoder, we see that it contains 832 gates, which means as
  much command signals. That is 26 control signals *per* bit.
  This is the direct consequence that *each* multiplexer has it's
  own structure. 26 signals takes up more than half the horizontal
  routing capacity of a slice (40), this result in an unroutable
  design, the bits are kept into one row each.
    832 gates is for the TSMC 180nm, for SkyWater 130nm we got
  976 gates on the third level.

Conclusions
~~~~~~~~~~~

1. A Yosys generated SRAM cannot be regularly  placed, neither in
   2-D matrix fashion nor in simple bit-line organization.

2. Worse, a thorough analysis of the generated netlist shows it is
   highly sub-optimal. Yosys generate *way* too much signals to
   achieve it, resulting in a bloated design.

3. Creating a small generator of SRAM, even based on standard cells
   would be a great improvement over the Yosys generated one.
   (the simpler OpenRAM approach)

Looking backward, as we were using Yosys generated SRAM in the LibreSOC,
that explain lot of the observed congestion.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list