[libre-riscv-dev] pipeline stages controlling delays

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sat Apr 6 09:06:06 BST 2019


On Sat, Apr 6, 2019 at 12:16 AM Jacob Lifshay <programmerjake at gmail.com> wrote:

> > (or, it was: now it is responsible for dynamically indicating whether
> > it is ready to receive data, and to dynamically indicate whether its
> > output is ready.  however, again, that is related to the *data*, which
> > has nothing to do at all with the *transfer or handling* of that
> > data).
> >
> You would still need both ready and valid on both the input and output of a
> FSM since a FSM could handle more than one data item at a time (eg. CORDIC
> unit with 2-port lookup table can do 2 operations every 30 or so cycles for
> f32)and the FSM needs the inputs that tell it if the input data is valid by
> the next clock edge and if it should wait or can output data values right
> away.

 scary.  i've heard about these, they're usually
non-something-or-other FSMs, i.e. they're an overlapping pair of FSMs
and there are online tools that can turn them into a discrete FSM.
that was it.  non-discrete FSM.

the number of states such dual-operational FSMs have is mad: some of
them are exponentially-related to the size of their non-discrete
counterpart.

> Once you have both ready and valid on both input and output, then it
> is simpler to just thread ready/valid through all the stages as I described.

 i really really do not like the idea of conflating the responsibility
for data processing with the responsibility for data control into the
same class.  there's more to it than just that, i'm having difficulty
expressing everything, as it's all heavily inter-related.

 also, the experimentation underway (with stage d_valid and d_ready
properties) showed me that yes, all four are needed... however that
*still does not automatically mean* that the *data processing* side
should be responsible for data *control*.

 i greatly prefer, just as process() returns the processed data, as
opposed to explicitly doing m.d.comb += self.i_data.eq(something) when
we *know for a fact* that all processed data is to be stored somewhere
in exactly the same way by a given Data Control instance, regardless
of the type *of* that Data Control Instance, i.e. irrespective of
whether it is a Buffered Data Control instance or an Unbuffered Data
Control instance or any other type of Data Control instance)

 likewise, correspondingly, the d_valid and d_ready properties (i made
them properties for now) have *nothing to do with the Data Control
instance that receives and responds to them*.

 now, if this were a type of design where the routing and handling of
the data depended *on* the data, i would be agreeing with you
absolutely, without question.

however given that the data is entirely opaque to the Data Control,
and i can envision no circumstances where that would be a
general-purpose "good idea" to let the Data Control/Routing side
imagine that it should gain access to, manipulate or react to the
actual data, i believe that it should be kept entirely isolated
regardless of "simplicity", as a way to make it absolutely clear to
users that the two - Control and Processing - are to be kept
absolutely separate.


> >  jacob, it's just not as flexible an API [and not actually *having*
> > the stuff there is infinitely better than writing documentation (which
> > people may or may not read) saying "don't use this"].
>
> except that that stuff is necessary in some cases.

 then they should be examined on a case-by-case basis, instead of
merging the two prematurely.

> What we could do is
> create a StageWithReadyValid (needs a better name) and just use that as the
> type that all stages are converted into by either being already
> StageWithReadyValid or by being wrapped with CombStage, similar to how
> integers are wrapped into valid nmigen Value instances.
>
> (makes me wish we were using Rust:
> CombStage would just be:
> impl<I: SignalGroup, O: SignalGroup, F: Fn(I, &mut Module) -> O> Stage<I,
> O> for F {...}

 it's probably doable, using iterators and reduce, or something
equally obscure.  overriding __add__ or something as a way to connect
two stage instances [basically __add__ is calling connect_to_next,
then reduce on... you get the idea]

 actually... *click*... the input "type" and the return "type" of the
function: that's it.  it should be possible to do introspection on
them.  i was trying to work out how the hell you could encode the
input and output data types into a function.  without the input and
output data types, the function itself has to declare them (somehow)
and it gets complex enough to make it not worthwhile, you might as
well use a class.

> that way you could just use any function with the right prototype as a
> Stage without needing to make a CombStage or anything.)

 i see where you're going with that: i like it (personally), the only
issues being (a) there's more urgent stuff that needs doing (b) i'm
slightly nervous of maintaining something like that.  can we leave it
as an experiment for later?

> >  bear in mind: what you've written will need a complete redesign to
> > allow the dozen different types of data format options currently
> > supported by the pipeline API, and a yet further redesign to allow for
> > use in a FSM, Global CE, travelling CE or STB/ACK handling
> > arrangement.
> >
> I don't think supporting every possible kind of pipeline control should be
> a goal. Ready/Valid is a generalization of travelling CE and probably some
> others.

 it is and it isn't.  yes it is... whilst having extraneous logic
gates around [therefore it isn't]

* the global CE there is no actual need for ready/valid at all.  or...
the only signal in is "valid-in", and it's global.  if there happens
to be d_valid and d_ready, it's too complex for a global CE to cope
with, throw an exception.  otherwise, just pass the data blindly down
the chain, in lock-step.

* the travelling CE, there is only valid-out and valid-in, there is
*no* ready-out or ready-in.

with FPADD and FMUL, they are actually straightforward enough that
they can be turned into a global CE.  in the Reservation Station /
Function Unit code (see fpadd/pipeline.py), the Fanout "result
receiver" would connect its (muxed, multiple) data_in_ready directly
to that global CE, *and* to the Fan-in "operand storer".

that's actually what's happening right now, except it's rippled
through a chain of (unnecessary) combinatorial logic, going through
several extraneous gates (including three sets of latches) when,
actually, all that's necessary is... a global CE.

i appreciate that it's an optimisation: i just do not wish to rule it
out by a forced and unnecessary merge of Data Control with Data
Handling that makes four valid/ready signals absolutely mandatory,
with clear disregard for all and any other possible potential Data
Control mechanisms.


> Due to STB/ack (as described in your previous messages) not
> supporting a data transfer every cycle, I would rule it out as a practical
> pipeline control spec.

 the problem is: there's two [actually, more: there's fclass and fcvts
to do as well] FP algorithms that need converting from STB/ACK over to
the Control Data / Control Handling format.  it took a hell of a long
time to do FPADD, and i made a dog's dinner mess of the code, trying
to convert it.

 without something that looks reeeasonably like STB/ACK on the
front-end, fmul and fdiv can be transformed a piece at a time.


> My current plan is to improve the code you wrote and use that as a base,
> adding support for ready/valid signalling in each stage (or equivalent
> using wrapper classes), RegStage and BreakReadyChainStage as separate Stage
> classes (making UnbufferedPipeline and BufferedPipeline just be convenience
> wrappers over StageChain, RegStage, and BreakReadyChainStage), and shifting
> to using the Stage interface everywhere instead of having a separate
> Pipeline interface.

 i really do not think that it is a good idea to expose data
*handling* to the data *control* classes, forcing a mandatory hard
requirement to have valid/ready signals, and absolutely nothing else,
in the process.

> >  by the time both of those near-total redesigns are done, the end
> > result will be a direct functional equivalent of the *existing
> > proposed pipeline API*
> >
> hence why I'll be building off of the existing code that you wrote.

 if the Data Control aspect needed to make routing decisions based on
the contents of the data, and if there was no potential need for
alternative Data Control mechanisms, i would say yes, immediately, go
ahead with some wrapper classes.  however, i am simply not convinced
[and there are other reasons as well, less technical]

 it would be much more productive to help me track down the two-stage
pipeline bug (buf + unbuf) - Test 999, and help me to track down why
Test 14 is failing.

 now, in the process of doing that, it *may* turn out to be the case
that a redesign is needed, as part of a solution.

> > or did i misunderstand?
> >
> The FSMStage class would have yet-to-be-proposed improvements over
> implementing Stage directly. We can figure those out when we actually write
> a FSM.

 the reason i'm tackling the data d_ready / d_valid right now is
because i'm looking to put fpdiv - as-is - as an FSM - into a
Reservation Station / Function Unit.  so that's now, not later!

l.



More information about the libre-riscv-dev mailing list