[Libre-soc-bugs] [Bug 1044] SVP64 implementation of pow(x,y,z)

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Tue Oct 10 05:26:26 BST 2023


https://bugs.libre-soc.org/show_bug.cgi?id=1044

--- Comment #44 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #42)
> (In reply to Jacob Lifshay from comment #39)
> > (In reply to Luke Kenneth Casson Leighton from comment #37)
> > > but just going straight to something inefficient (such as the
> > > loop-unrolled mul256 algorithm you wrote, although it gets us
> > > one incremental step ahead), this is *not* satisfying the conditions
> > > of the grant.
> > 
> > which definition of efficiency are you using?
> 
> the one that meets customer requirements which i repeated many times:
> top priority on code size. number of regs second.

ok.
> 
> it is down to the hardware to merge VF and HF elements into
>  "issue batches".  which is here repeatedly everyone including
> you keeps assuming VF is incapable of doing that "thrrefore it
> musy be inefficient performance wise".

I was basing my efficiency claims on both:
* the complexity I expect will be required to get a vertical-first divmod to
work at all. I fully expect it to take *more* (and more complex) instructions
than the horizontal-first version, because afaict it doesn't cleanly map to VF
mode. this is bad for both code size and power and probably performance.
* it will most likely require lots of dynamic predicates (more than just 1<<r3)
with *large* amounts of bits that are zeros, this inherently is rather
inefficient from a performance perspective, because I'm assuming either:
  * the predicate will have to be handed to the decode pipe
    before the predicated operations can be issued. this is
    bad for performance because you're forced to stall the
    entire fetch/decode pipe for several cycles while waiting
    for the predicate to be computed.
  * the predicate is not known at decode/issue time, so the
    full set of element operations are issued, potentially
    blocking issue queues, only to later find out that most
    of them were wasted. this is bad for both power and
    performance.
    the predicate not being known at issue time also means
    that propagating results to registers and/or any following
    instructions is also blocked for any instructions that
    use twin-predication, since the cpu needs to wait until
    it knows which registers to write to.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list