[Libre-soc-bugs] [Bug 1044] SVP64 implementation of pow(x,y,z)

Thu Sep 14 08:19:32 BST 2023

https://bugs.libre-soc.org/show_bug.cgi?id=1044

--- Comment #11 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #10)
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=e044403f4c243b3248df0872a661756b6cc8a984
> 
> Author: Jacob Lifshay <programmerjake at gmail.com>
> Date:   Wed Sep 13 23:24:12 2023 -0700
> 
>     add SVP64 256x256->512-bit multiply
> 
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=ec1196cb3abedd28492be29546e52959dee1a030

all of these can go in a vertical-first loop, using
predication 0b1000100010001000 and sv.addi/pm=nn a,b,c
to "pretend" there is a pair of nested loops.

  30     "sv.adde *7, *7, *21",
  31     "addi 24, 0, 0",
  32     "sv.maddedu *20, *12, 19, 24",  # final partial-product a * b[3]
  33     "addc 7, 7, 20",
  34     "sv.adde *8, *8, *21",

i.e. sv.addi with scalars does exactly the same thing as the
non-prefixed equivalent except you get predication and elwidth
overrides

the extra sv.adde can add a zero'd out source, making the first
iteration orthogonal with all other iterations and consequently
the VF loop can be applied.

and you can use Indexed REMAP on the sv.maddubrs and sv.adde.
only pain being, you have to enable/disable svremap before each
so i suggest using "non-persistent" svremap 

also same as Andrey, all files require a copyright notice, you
know this https://bugs.libre-soc.org/show_bug.cgi?id=1126#c13

> Author: Jacob Lifshay <programmerjake at gmail.com>
> Date:   Wed Sep 13 23:22:33 2023 -0700
> 
>     generalize assemble() fn so other test cases can easily use it

like it.

-- 
You are receiving this mail because:
You are on the CC list for the bug.