[libre-riscv-dev] pipeline sync issues

Luke Kenneth Casson Leighton lkcl at lkcl.net
Fri Apr 12 04:51:18 BST 2019

On Fri, Apr 12, 2019 at 4:07 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Thu, Apr 11, 2019, 19:49 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
> > On Fri, Apr 12, 2019 at 3:03 AM Jacob Lifshay <programmerjake at gmail.com>
> > wrote:
> >
> > > >  was this what you have been referring to, jacob?
> > > >
> > > yeah, that's part of it.
> >
> >  ok.  the implications are, to be able to follow through on that: no
> > Stage may contain sync.  at all.  anything that is chained together
> > (in the intended single-cycle fashion) must be pure combinatorial.
> > there must not be, under any circumstances, the use sync in the data
> > path, data chaining, valid/ready or valid/ready chaining.
> >
> I don't understand how you came to that conclusion.

 if the proposed Stage (merged) contains *any* sync'd logic on data,
ready or valid signals, it will *NOT* be possible to chain them
together and expect the output to be COMBINATORIALLY produced in a
SINGLE cycle.

 this is a fact.  it is unavoidable.

example: if you chain two of these together (logic taken directly from

        m.d.sync += self.buffer_full.eq(~self.succ.ready_in
                                        & (self.pred.valid_in
                                           | self.buffer_full))

the result will *NOT* be outputted combinatorially.  the ready/valid
signals at the end of the chain WILL be outputted in TWO clock cycles,
NOT one.

that's what i mean.

the only way to make a single-clock-cycle-response chain of Stages is
to make them *pure* combinatorial blocks.

that includes the ready/valid signals (if the ready/valid signals are
to be merged into Stage as proposed).

which in turn breaks the ready/valid signalling synchronicity contract
which is REQUIRED to actually synchronise the very stages that are
attempting to be chained together!

therefore it cannot be done.

does that make sense now?

basically, Control Blocks *MUST* be Synchronous, and Data Blocks
*MUST* be Combinatorial.

so this is why StageChain exists: it performs a *BYPASS* of
ready/valid signalling *ENTIRELY* as that is the *ONLY* way to not
have sync introduced anywhere into the processing chain.

it's taken me a while to understand that.  that's the key message that
i've been trying to get across to you for several weeks, i just only
understood it subconsciously so couldn't state it as clearly and

> What I think will work best is to just not support adding additional data
> signals to Stage interfaces,

 it would not make sense to add additional signals to Stages,
particularly after they're in "use".  that would be dangerous, and
should never be done.

 you must mean something else, can you clarify?

> and then a FIFO can just be another stage with
> ready/valid signalling, just like RegStage and BreakReadyChainStage.

 i've already got FIFOControl working, including with "incoming"
processing capability.  so the data, as it comes into the FIFOControl
rather than going directly into the FIFO din (memory block), goes
through the Stage.process() function.

 the output of that conforms to stage.ospec(), is run through
flatten(), and the (one) Signal of width total equal to the entire
(recursive) RecordObject format is assigned to the Memory read port of
the FIFO.

 the upshot of that is that the Stage may be a *COMBINATORIAL* block
conforming to the Stage API *INCLUDING A StageChain*.

 the end result is the possibility to daisy-chain combinatorial stages
(data processing blocks) together, and still have them buffered by a

> I'm
> quite sure that ready/valid signalling can emulate with no additional
> circuitry all of Global CE, Traveling CE, strobe/busy, and the other
> pipeline control schemes except maybe for the strobe/ack scheme that
> requires 2 cycles for every data element (which can be emulated by using
> some adaptors). I'm writing in my new proposal how restricted cases of
> ready/valid are logically identical to the other schemes.

 i believe you may be correct about Global CE and Travelling CE, as
long as it's possible to assign ready/valid to Const(0) or Const(1) as
appropriate.  that _will_ mean critically relying on the optimiser of
yosys, which i am not confident about (not enough experience with).

STB/ACK i've not properly analysed.

> I think that the FIFO class may be designed for crossing clock domains and
> therefore has extra delay for synchronization.

 you're confusing four classes together (see the docstrings in lib/fifo.py)

 there's four classes, SyncFIFO (which has an *optional argument* to
introduce a synchronisation delay cycle), AsyncFIFO, and
SyncFIFOBuffered and AsyncFIFOBuffered

 you're referring to ASyncFIFOBuffered, which has the extra delay
because Memory on FPGAs typically does not have the write-through
capabilities that would allow its incoming data to be read on the same
clock cycle as it is written, and it also has the clock-domain

 SyncFIFO* do *not* have clock-domain crossing.

 AsyncFIFO* have clock-domain crossing.


More information about the libre-riscv-dev mailing list