[Libre-soc-bugs] [Bug 1157] Implement poly1305

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri Oct 13 18:41:15 BST 2023


https://bugs.libre-soc.org/show_bug.cgi?id=1157

--- Comment #32 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Sadoon Albader from comment #31)
> My current brainstorming for the initial h[x]+= block:
> 
> 1- store 0xfffff... (twice) and 0x3ffff... in three registers (p0,p1,p2)

yep

> 2- store h0, h1, and h2 in 3 registers
> 3- split t0 and t1 into 3 registers:
>     t0_s = t0
>     t1_s = t0 >> 44 | t1 << 20
>     t2_s = t1 >> 24 | hibit (this might be wrong, need to check)

hibit is "1<<40 if final else 0"

you can use sv.dsrd here with some "morphing".  notice
how 44+20 == 64 and also 24+40 (from the hibit shift)
is also == 64?

therefore if you do:

* t2 = 1
* set an array of [44,24] in RB
* point RA at t0
* point RT at t1_s

you have created t1_s with one vector instruction.

now, of course, it looks pointless because you have one
instruction to setvl=2 then sv.dsrd is another, you might
as well have done two dsrd instructions!

so try that first ok?

theeen.... ha ha you'll like this: try with *three* dsrd instructions,
but use an array [0,44,24] in RB! 

> 4- setvl 3? 9?

2, starting at t1_s and going to t2_s.  but if you do the
trick above of RB=[0,44,24] and start at t0_s then you get
the "t0_s = t0 >> 0 | something" , arrange something to contain
zero and voila, no need to do a manual copy (no addi t0_s,t0,0)


> 5- the final run should be something like this:
> 
>    h_s[x] += t[x]_s & p[x]

t_s[x] but yes

> Which is perfectly doable in two or a few more SVP64 lines if I'm now
> understanding things correctly.

yes it is. if you stick to Horizontal-First then annoyingly it
has to be done as:

  # first HF instruction, sv.and
  for x in range(VL)
    tmpregs[x] = t_s[x] & p[x]
  # 2nd HF instruction, sv.add
  for x in range(VL)
    h_s[x] += tmpregs[x]

which given the wasted extra vector of tmpregs is precisely why Vertical
First was invented, as that becomes:

   for x in range(VL)
     tmpreg = t_s[x] & p[x]
     h_s[x] += tmpreg

note tmpREG scalar rather than tmpREGS vector, but you need a loop
and an svstep. instruction which is more instructions, so you only
use VF if the regfile is under pressure or you just want to show off :)

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list