[Libre-soc-bugs] [Bug 1143] New: Optimization of verilator for scalability
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Wed Aug 23 11:58:01 BST 2023
https://bugs.libre-soc.org/show_bug.cgi?id=1143
Bug ID: 1143
Summary: Optimization of verilator for scalability
Product: Libre-SOC's first SoC
Version: unspecified
Hardware: PC
OS: Linux
Status: CONFIRMED
Severity: enhancement
Priority: ---
Component: Source Code
Assignee: lkcl at lkcl.net
Reporter: konstantinos at vectorcamp.gr
CC: libre-soc-bugs at lists.libre-soc.org
NLnet milestone: ---
Currently verilator performance does not scale with multiple threads due to its
internal queue model and its heavy use of mutex objects to lock the queue.
Because of that, simulation performance does not take advantage of CPUs with
many cores. After some initial profiling, I have found that most of the CPU
time is spent in the internal queue:
41.71% microwatt-verilator [.] VlMTaskVertex::waitUntilUpstreamDone
32.97% microwatt-verilator [.] VlWorkerThread::dequeWork
8.72% microwatt-verilator [.] VlMTaskVertex::signalUpstreamDone
So about 84% of CPU time is spent on synchronization between threads. This is a
huge waste of CPU time and definitely something that can be fixed.
I believe that replacement of the internal queue with a lockless thread-safe
queue will increase performance by at least an order of magnitude. I have done
this in the past in very demanding realtime applications and performance was
greatly improved many times.
The plan is to also submit this work upstream to benefit the verilator project
overall.
I believe that a budget between 7-10k EUR would suffice for this kind of work.
It goes without saying that it will be heavily tested before submission.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list