foind it. fmuls has to be carried out at full 64 bit precision then a special 64 to 32 truncation performed, i am going to use the algorithm from "frin". otherwise a couple of bits are lost in the precision, which is enough to throw off the output by 1 bit in some cases. should be done today then can move on to SVP64 reduce mode. l.