[Libre-soc-dev] gcc binutils sv cryptoprimitives etc
programmerjake at gmail.com
Wed Jan 20 07:49:53 GMT 2021
On Tue, Jan 19, 2021, 22:09 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> On Wednesday, January 20, 2021, Hendrik Boom <hendrik at topoi.pooq.com>
> > Cryptography would also benefit from constant-power execution.
> > This may also be difficult.
> it's insanely difficult and obliterates performance. L1 and L2 caches are
> out. changes in a single bit from 1 to 0 must be masked with a
> corresponding change from 0 to 1 in order to balance the books and that
> must occur even when no change is required (!) which is a whopping 4x
> increase in power.
> mental and an entire area of research all on its own.
yup, hence why I never advocated for constant-power. My model for an
attacker is basically someone at the other end of an ethernet cable (just
data/timing, not voltages/emi).
Jacob you are doing the rabbithole "narrow focus" thing again, not
> listening to what the scope is, getting stuck on an irrelevant detail
> without thinking through the larger picture.
> the scope here is a *network* processor.
I'd assume some users would like to run a web server on our processor
without leaking the AES session keys/encrypted data for their tls
connections. Another option is having our processor run a vpn server
without leaking encrypted data.
> end-user applications are prohibited.
realistically users should be able to run home web servers (unmodified
Apache, Nginx, etc.) on Libre-SOC routers without needing to do anything
other than have the SV AES assembly added to OpenSSL's crypto library.
This rules out achieving constant time by using timers, since every web/vpn
server I've ever heard of will send the packets as soon as they can, they
do *not* have the massive(?) internal changes required to support delaying
sending responses to achieve constant-time.
> end-user logins are prohibited.
> the only interface (attack surface) is the NETWORK.
> that operates at millisecond accuracy and response time, doesn't it?
yes, but that doesn't mean nanosecond-level timing sidechannels can't be
Paper describing attacking cpu-cache timing variations in software AES over
a remote network:
> so what possible relevance would *nanosecond* level variance in completion
> time have on *millisecond* overall NETWORK packet level response time, when
> that millisecond response time had been made uniform by way of a
> constant-response uniform timer?
that timer will need massive software ecosystem changes to implement and
the maintainers of such software are not likely to accept patches to
intentionally slow down their software just cuz we want it -- basically not
going to happen.
data dependent constant time is a bitch. predication is out.
I consider predication (because it is a kind of control-flow) to be one of
those operations, like divide or load from data-dependent address, that we
won't provide data-independent execution time for (programmers wouldn't
expect cpus to provide that anyway for instructions like branch, load from
data-dependent address, integer divide, etc.).
Basically, we need the actual data being encrypted/decrypted to only go
through the data paths that do respect data-independent execution time:
load/store data (not address), general-purpose registers, register-register
move, bitwise ops, add/sub/clmul/mul, aes/sha* step, and whatever else I
outside that short list, programmers generally don't expect cpus to have
data-independent execution time and won't use the unsecure data paths for
crypto primitive implementations.
postponed rfc for adding side-channel resistant (secret) types to Rust:
> optimisations i would like to do for zeroing would be destroyed because a
> zero predicate bit would allow us to skip issuing to the Reservation
> can't do that because it would alter completion time.
you can because predicates are like integer divide, they're not expected to
keep the same execution time.
> using FSMs in FUs with early-out?
> can't do that.
you can, but only for ops programmers expect to be variable-time, such as
> using analysis of Condition Registers to check early-out of loops?
> can't do that.
> Karatsuba Multiply algorithm?
> can't use that because it detects and skips zero elements which is
> data-dependent nonuniform completion time.
all you need is to not skip elements -- which is what cryptographic
multiply algorithms do.
> every optimisation opportunity whether at hardware or software level is
no, some are, quite a lot aren't. In particular, the instruction scheduler
is free to do what it pleases since it deals entirely with control
> do you see how utterly destructive and disruptive it would be to try to
> design the entire processor around data dependent constant time?
actually, it's not that bad, since all we need is to have a few
known/expected alus be constant-time, the general-purpose registers/data
busses to be constant time, and the load/store to not look at load/store
data (address is fine/expected), the rest of the cpu can do whatever it
pleases, randomly delaying/not however it likes.
Basically, as long as the multiply, shift/rotate, and AES/SHA ALUs are
constant time, and we don't go out of our way to do stupid things, then the
goal is met. Pretty easy.
power analysis is likewise completely out of scope by assuming that
> attackers have zero physical access.
Yeah, power is *super* hard -- hence why my attack model is limited to
network. If attackers have physical access, they could just copy the hard
drive, freeze/read out the ram, sniff the ram bus, sniff the cpu's
internals (requires a sensitive probe and lots of ingenuity),
or --- just threaten the sysadmin with a wrench until they tell the
attackers the password:
> in this way we can get in an application for funding and go beyond Nov
sounds like a good goal.
More information about the Libre-soc-dev