[Libre-soc-bugs] [Bug 770] Discussion and Finalisation of Which Cryptographic Primitives to Implement

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Sun Oct 16 20:00:17 BST 2022


--- Comment #11 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #9)
> > there *is* a point,
> it's always a good idea to read ahead, all messages, in full.
> i already worked that out :)

i'd read all messages, my point was different than the one you arrived at...
> > run much faster 
> please understand and accept that the purpose of the exercise,
> under this Grant, is *not* processor speed, it is instruction
> count reduction and thus efficiency and power reduction.

i assumed that you would infer I meant using instruction count and probable
implementation strategy as a proxy for speed.
> speed is an arbitrary factor based on a direct near-linear
> relationship
it's linear only at the low end, it gets very non-linear very quickly, e.g.
going from 4-wide to 8-wide issue doesn't give a 2x speedup, it's closer to
somewhere around 1.2x.

also, serial dependencies usually prevent wider silicon from giving much
speedup, if at all, so that's a very good reason to work on reducing
instruction count and/or switch to instructions known to be easily
implementable with much higher performance such as sv.maddedu rather than a
complex loop of adde/maddld/maddhdu instructions.

> with how much back-end silicon is thrown at the
> problem, whereas power consumption is not.
> > is very highly significant
> anyone may throw more silicon down and claim "it's significant".
> every datasheet for hard macros also contains power consumption
> figures and these are what is much more critical.
> SV being completely abstract and an architecturally independent
> ISA the only thing we can possibly claim right across the board

then don't claim right across the board, maybe claim that with the appropriate
microarchitecture (128x64->192-bit wide-mul for sv.maddedu rather than u64x2
simd mul, or even more if working with 256-bit simd) it gets a huge speedup,
much more than spending the same silicon on additional scalar performance
(you'd definitely hit diminishing returns by that point, you'd only have enough
additional for maybe 1 alu where you likely already have several). maybe claim
that it doesn't have terrible performance issues for smaller microarchitectures
if svp64 is implemented in hardware (e.g. excluding trap & emulate)

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-soc-bugs mailing list