[libre-riscv-dev] daily kan-ban update 09jul2020

Fri Jul 10 13:33:10 BST 2020

On Fri, Jul 10, 2020 at 10:54 AM Jacob Lifshay <programmerjake at gmail.com> wrote:

> I checked quite a few signals that I know are critical for calculating the
> correct result, such as the ra and rb signals in the *setup* stage, and the
> dividend, divisor_radicand, and operation of DivPipeCore.

so let's track that with the yosys graphvis (show setup).  i can see
ra going in to dividend_neg and abs_dividend (this is visually, not
looking at the code), and i can see rb going in to divisor_neg and
abs_dor (i shortened the name abs_divisor).

however what i *don't* see is ra or rb going into *output* signals
from the setup stage.

now let's look at DivSetupStage (in fu/div/setup_stage.py) and sure enough:

        comb = m.d.comb
        # convenience variables
        op, a, b = self.i.ctx.op, self.i.a, self.i.b
        ...
        ...

        ###### sticky overflow and context, both pass-through #####

        comb += self.o.xer_so.eq(self.i.xer_so)
        comb += self.o.ctx.eq(self.i.ctx)

        return m

at no point is either ra or rb in being copied into ra or rb *out*.
there's no self.o.rb.eq(self.i.rb)

basically what i'm pointing out is that there may not be a bug in nmigen at all.

> true, but they're thrown away after where I checked. I looked through the
> whole list of signals and I think all signals named ra and rb including
> variants like ra$3 were 0 everywhere.

yes, because once the chain is broken (and it looks like it's broken
in DivSetupStage) that's it, zeros will obviously get propagated from
that point onwards in the pipeline.

adding this, the signals are no longer zero, right the way throughout
the entire pipe_N_to_N chain:

diff --git a/src/soc/fu/div/setup_stage.py b/src/soc/fu/div/setup_stage.py
index 25daa201..8975416f 100644
--- a/src/soc/fu/div/setup_stage.py
+++ b/src/soc/fu/div/setup_stage.py
@@ -81,6 +81,9 @@ class DivSetupStage(PipeModBase):

         ###### sticky overflow and context, both pass-through #####

+        comb += self.o.ra.eq(self.i.ra)
+        comb += self.o.rb.eq(self.i.rb)
+
         comb += self.o.xer_so.eq(self.i.xer_so)
         comb += self.o.ctx.eq(self.i.ctx)

we don't actually want to do that because it's 128 extra wires.  those
have to go.  there's currently something like.... 600 wires being
passed between each stage right now.

> > there's nothing wrong with the simulation / pipeline: it's the
> > calculation of overflow.  it's real simple to check: delete these
> > lines
> >
>
> yeah, i got that. I was trying to get a working vcd trace so I could follow
> the overflow computation, which is split between the setup stage and the
> output stage.

i've found it necessary to have *all* the files open on-screen (jobs |
grep vi | wc --> 48 *in one terminal*), have the yosys graphviz open
on a second virtual desktop (full screen, 3840x2160), the vcd trace
open in 1/3 of the screen, *and* do print statements from the
simulation.

the combination - reinforcement of information from multiple sources -
gives the picture in a way that a single tool flat-out cannot.


> If you have time, you could try to build a more minimal reproducer for
> whitequark,

that would take longer than it would to turn the microwatt FSM into
nmigen.  plus, given that the break in the pipeline has been
identified i don't believe it necessary to raise the bugreport.


> since what we have is waay too complicated to be that useful
> for finding the bug in nmigen. if it helps any, I'll put the git revs I am
> on:
> nmigen : 30e2f91176edcd1c8766c2cb11f413b9c77936b9
> ieee754fpu : 610b4a381e70f45f5684cc281398ce77fb5441fa
> nmutil : 3853df675a1e1db24950945f66b076266a7da409
> soc : caceb716e9417ed8731ef08b7b260a4c077186b2
>
> meanwhile, I'll be sleeping for a while.

:)

when you're awake can you make a decision as to whether to investigate
this further or whether to go straight to doing the FSM.

i *strongly recommend* converting divider.vhdl with no actual
"thought".  no cleverness, no "reinvent from scratch".  literally do a
line-for-line conversion of the file.

this because reproduction of all the edge cases - as we're discovering
- is simply too problematic.

microwatt on the other hand has been developed by people with *TWENTY
FIVE YEARS* continuous experience working with the PowerISA in
different forms.  the hardware RTL level was literally the last
remaining area where they didn't have any experience at all.

l.