[Libre-soc-dev] ASCON in SVP64

Luke Kenneth Casson Leighton lkcl at lkcl.net
Thu Mar 16 14:34:52 GMT 2023

```https://github.com/meichlseder/pyascon/blob/master/ascon.py#L283

with svindex and Horizontal-First Mode i believe this can be
done in around...  maybe 8 instructions? (excluding setup of
indices which is a one-off outside the loop).

first observation: S and T may be made sequential registers
with one "gap"
S[0] let us say is at r32 and this would place T[0] at r32+6
i.e. r38.

second observation: let us place the constant 0x0000 in
r42, 0xffff...ff
in r43 and use r44 as the "round constant",

third observation: note that up to the ANDing there is simply
"a bunch of XORs":

# --- add round constants ---
S[2] ^= (0xf0 - r*0x10 + r*0x1)
if debugpermutation: printwords(S, "round constant addition:")
# --- substitution layer ---
S[0] ^= S[4]
S[4] ^= S[3]
S[2] ^= S[1]
T = [(S[i] ^ 0xFFFFFFFFFFFFFFFF) & S[(i+1)%5] for i in range(5)]

up to here excluding the ANDs therefore we may set VL=10 and
have SVSHAPE0 set up in this order (for RT):

S2 S0 S4 S2 S0-copy T0 T1 T2 T3 T4

SVSHAPE1 may be set to:

S2 S0 S4 S2 S0 S0 S1 S2 S3 S4

SVSHAPE2 may be set to:

(roundconst r44) S4 S3 S1 (zero r42)(r43)(r43)(r43)(r43)(r43)

note that element 5 (S0-copy) is XORing S0 with zero, thus
taking a copy of S0.

then, the ANDing may be performed by either setting VL=5
(or using a predicate mask of 0b11111):

sv.and/m=r3 r38, r38, r33

this one is "clever". the target is T0-T4 (r38-42) but the source
starts at S1 (r33) and ends at the *COPY* of S0 (r37).

you get the idea.  this next bit could do the same trick,
a copy of T0 just after T4, which can be copied with the
exact same thing, but needing 6 ANDs not 5.  duplicate
copy of T0, duplicate ANDing needed, it's fine.

for i in range(5):
S[i] ^= T[(i+1)%5]
S[1] ^= S[0]
S[0] ^= S[4]
S[3] ^= S[2]
S[2] ^= 0XFFFFFFFFFFFFFFFF

using SVSHAPE3 this would be RT (and RA) but hm if additional
0xffff constants are inserted then it *might* be possible to
arrange these sequentially (not needing five SVSHAPEs).
or simply renumber S0-4 *such that* only one more SVSHAPE
is needed.

if that really is not possible then mtspr can be used to
store one of the SVSHAPEs in a temp GPR and put it back again.
it would up the instruction count to 11 or so but it is doable.
or, sigh, just issue the svindex instruction inside the loop.
which is annoying but also doable.

if debugpermutation: printwords(S, "substitution layer:")
# --- linear diffusion layer ---
S[0] ^= rotr(S[0], 19) ^ rotr(S[0], 28)
S[1] ^= rotr(S[1], 61) ^ rotr(S[1], 39)
S[2] ^= rotr(S[2],  1) ^ rotr(S[2],  6)
S[3] ^= rotr(S[3], 10) ^ rotr(S[3], 17)
S[4] ^= rotr(S[4],  7) ^ rotr(S[4], 41)

these should be easy, svindex2 i think or svindex in Matrix
Mode, setting "repeating" pattern (0 0 1 1 2 2 3 3 4 4).
rotation indices are a pain, no discernable pattern yet,
done as GPR constants.

overall though, this is:

svremap
sv.xor
sv.remap
sv.xor
svremap/svindex
sv.rotr

plus very little housekeeping.

l.

--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
```