[Libre-soc-bugs] [Bug 413] DIV "trial" blocks are too large

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri Jul 3 06:39:04 BST 2020


https://bugs.libre-soc.org/show_bug.cgi?id=413

--- Comment #13 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #8)
> > (In reply to Luke Kenneth Casson Leighton from comment #7)
> > > >>> 434*434
> > > 188356
> > > 
> > > down from 500,000 it is going to be several hours on placement alone.
> > > 
> > > each core section also looks too large, containing rr multiply that
> > > is not needed.  will try cutting that.
> > 
> > all the multiplies should be multiplying by small constants, which should
> > convert to a few adds.
> 
> adds that are 192 bits long.  this results in absolutely massive adders
> by the time it is converted to gates.  likewise for the trial_comparison
> (the greater-than-equal)
> 
> this results in a 450k VST file because it is literally around 2,000
> cells to do the compare @ 192 bit long
> 
> if one of those compares can be cut out (because the PriorityEncoder
> will always select at least the lowest flag) then that literally
> halves the number of gates when radix=1.
> 
> 
> can you help investigate by using yosys and installing coriolis2 and
> compiling
> the code so that you can see what is going on.
> 
> you need to understand exactly what is going on otherwise guessing what
> *might* work is going to be a waste of time and we do not have time to
> waste.
> 
> you need the feedback loop which you are entirely missing at the moment
> by not running the unit tests

That's simply because I didn't yet get around to working on the unit tests,
I've been distracted by improving power-instruction-analyzer to allow using the
tested-to-be-correct Rust instruction models directly from our python unit
tests by adding Python bindings. I didn't push yet because I'm in the middle of
a refactor and the code doesn't yet compile.

> and not running the tools.

I've been assuming that yosys synth is pretty representative, since it converts
to all the gates that are used in the final circuit. If wiring is taking up
much more space than would be expected from gate count, I can figure out how to
install coriolis2.

> 
> > if the div pipe is flattened, their is probably a lot more that can be
> > shared between all the different parts, such as every stage multiplying the
> > divisor by the same constants.
> 
> constants are simply converted to pulling locally to VSS or VDD at the
> point they are needed: they take up no space at all.

true, except that each stage has its own instance of `divisor * N` for example,
which gets converted to some adders, rather than a multiplier and a constant
(assuming yosys isn't stupid). If that's replaced with propagating the
pre-multiplied values through each stage, it would increase the d-ff count but
reduce the adder count.

Additionally, if the wreduce yosys pass is run, it reduces the width of
arithmetic operations when it can prove that a smaller bit-width will suffice.

DivPipeCore is really designed assuming it will be flattened and yosys will
then have a chance to const-propagate past pipeline registers and convert
multiple identical ops to use a single op.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list