[Libre-soc-dev] Paper about Dadda vs Wallace multiplier efficiency

Wed Dec 30 00:54:33 GMT 2020

On Tuesday, December 29, 2020, Cole Poirier <colepoirier at gmail.com> wrote:
> Found an insightful paper about multiplier latency and gate count because
Richard had some questions about it after Luke had to go today. Will post
on the wiki resources page once I’m at my keyboard.

star.

i remember from the discussions jacob and i had: wallace is easier to do
partitioning wise because the adds are just standard adds, one after the
other.

the dadda is you have these "towers" of bita that need adding, and you
select some of those and the carry gets added to the next tower.  repeat
until you have 2 flat numbers left.

dadda has a trick where the "towers" are always reduced to under 2/3 max
height in each layer

but

what you do is: each new batch of adds you put them at the *back* of the
queue so that there is time for the gate ripple to complete.

where it gets complicated for dadda is that to make sure we are not going
mental with FPGA resources we want to be able to use FPGA DSP adders.

so rather than create a batch of 3 bit adders my feeling is, we need a
solution that puts bits into standard 64 bit adders, similar to wallace.
 or, does them as a suite of parallel ORs and parallel ANDs and XORs.

i.e:

    a = Signal(64)
    b = Signal(64)
    c = Signal(64)

    out = a ^ b ^ c
    carry = a&b ^ b&c ^ c&a

this would do the trick, use DSP resources, yet still be effective in ASIC.

it would be best first implemented in python using lists as data
structures, because actually, converting those to nmigen is real simple:
pass the list to Cat(). duh.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68