[Libre-soc-dev] ASIC/FPGA discrepancies (was Re: daily kan-ban update 12aug2020)
whygee at f-cpu.org
whygee at f-cpu.org
Wed Aug 12 13:51:03 BST 2020
Hi !
Here I'm only providing a perspective, not a statement
on the quality of your work (which I respect a lot)
On 2020-08-12 13:53, Luke Kenneth Casson Leighton wrote:
> because of the massive regfile broadcast buses (20 to one) the routing
> is
> so enormous that it is only possible to achieve 16 mhz.
>
> today:
>
> very annoyingly, investigate replacing the unary int and fast regfiles
> with
> binary-addressed ones. this may allow FPGA SRAMs to be used.
>
> i really do not like the idea of designing for FPGA targets when we are
> doing an ASIC.
This is more or less a repeat of what happened with F-CPU
(yes, yet another parallel !) so it feels weird to me :-D
I designed FC0 on a blank sheet of paper and it looked/sounded great.
Then I tried to input it in a Mentor/Actel EDA package and it became
a huge mess.
You have to remember that your architecture comes from a 60s era system,
when people were allowed to think in 3D and make funky stuff like NOR
latches.
Inside a monolithic chip, things are very different and typical FPGA
FORCE you
to think not only in 2D but also with a more limited "design
vocabulary".
LUT4, latch. rinse, repeat. Then pray for the router to not make it even
worse.
That is why I'm still sticking to the Actel ProASIC3 family, which
was designed about 20 years ago : it's the closest FPGA family to real
ASIC
and the granularity as well as the "vocabulary" is close enough that
I could prototype circuits without too many discrepancies.
Boolean functions are limited to 3 inputs but that's roughly
what is efficiently working on standard ASIC cells anyway.
This also allows me to rethink how I design things, what is best
suited for FPGA, what works best for ASIC, and provide alternate
implementations that map a function to a particular target.
These days, each of my units are written in at least 2 versions,
including behavioural, ASIC and FPGA.
Your fanout problem is exactly why I had to research "Binary Trees
with Balanced Control" because without dedicated "hard block",
even with only 8 or 16 registers, the fanout problem arises and
kills the performance. You have to outsmart yourself.
More generally : if it works badly on FPGA, it *might* also
be performing like a drunken snail in ASIC. As a rule of thumb,
the smaller the register set, the even better it performs,
if only just because wires are shorter (speed decreases
with the square of the length).
You hit a fanout wall and I suppose there is a way to rethink this,
I hope you'll find alternate methods, maybe like partitioning,
duplication, I don't know, you're the one who knows best.
Just my half € cent,
> l.
yg
More information about the Libre-soc-dev
mailing list