# [Libre-soc-bugs] [Bug 1157] Implement poly1305

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri Oct 13 18:41:15 BST 2023

```https://bugs.libre-soc.org/show_bug.cgi?id=1157

--- Comment #32 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
> My current brainstorming for the initial h[x]+= block:
>
> 1- store 0xfffff... (twice) and 0x3ffff... in three registers (p0,p1,p2)

yep

> 2- store h0, h1, and h2 in 3 registers
> 3- split t0 and t1 into 3 registers:
>     t0_s = t0
>     t1_s = t0 >> 44 | t1 << 20
>     t2_s = t1 >> 24 | hibit (this might be wrong, need to check)

hibit is "1<<40 if final else 0"

you can use sv.dsrd here with some "morphing".  notice
how 44+20 == 64 and also 24+40 (from the hibit shift)
is also == 64?

therefore if you do:

* t2 = 1
* set an array of [44,24] in RB
* point RA at t0
* point RT at t1_s

you have created t1_s with one vector instruction.

now, of course, it looks pointless because you have one
instruction to setvl=2 then sv.dsrd is another, you might
as well have done two dsrd instructions!

so try that first ok?

theeen.... ha ha you'll like this: try with *three* dsrd instructions,
but use an array [0,44,24] in RB!

> 4- setvl 3? 9?

2, starting at t1_s and going to t2_s.  but if you do the
trick above of RB=[0,44,24] and start at t0_s then you get
the "t0_s = t0 >> 0 | something" , arrange something to contain
zero and voila, no need to do a manual copy (no addi t0_s,t0,0)

> 5- the final run should be something like this:
>
>    h_s[x] += t[x]_s & p[x]

t_s[x] but yes

> Which is perfectly doable in two or a few more SVP64 lines if I'm now
> understanding things correctly.

yes it is. if you stick to Horizontal-First then annoyingly it
has to be done as:

# first HF instruction, sv.and
for x in range(VL)
tmpregs[x] = t_s[x] & p[x]
# 2nd HF instruction, sv.add
for x in range(VL)
h_s[x] += tmpregs[x]

which given the wasted extra vector of tmpregs is precisely why Vertical
First was invented, as that becomes:

for x in range(VL)
tmpreg = t_s[x] & p[x]
h_s[x] += tmpreg

note tmpREG scalar rather than tmpREGS vector, but you need a loop
and an svstep. instruction which is more instructions, so you only
use VF if the regfile is under pressure or you just want to show off :)

--
You are receiving this mail because:
You are on the CC list for the bug.
```