[libre-riscv-dev] buffered pipeline

Wed Mar 27 10:21:05 GMT 2019

On Tue, Mar 26, 2019 at 10:23 PM Jacob Lifshay <programmerjake at gmail.com> wrote:

> Sorry, I often put off replying till later, and then never get to it.

 been there, done that :)  learned to hit "reply" and save-draft.
sometimes doesn't work though

> Will respond in-depth in a separate email.

 [barrel processor] - appreciated

> I was planning on changing the interfaces to allow a Signal or a Record in
> the interface, though I'm not sure how to handle associating a Direction
> with the Signal, maybe by allowing a (Signal, Direction) tuple in the
> interface and having a Signal by itself be Direction.NONE
> I honestly think it would be cleaner to not allow interface types other
> than Signal (with optional Direction) or Record.

 this is the disadvantage of writing an API according to a strict
policy.  to go down the route you're suggesting i would be forced to
modify *over ten* classes in the FPAdd code, to flatten them to a
linear Signal (or Record) format.

https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/add/nmigen_add_experiment.py;h=b7c771f7da989e8d4c4d9f66807c003622ee3905;hb=a13e953b2cad6f173696f5c72aa6f52e534278ca#l252

that's a hell of a lot of work.

all of the class-based data structures which had been slowly
accumulated to represent data in different formats, which are easy to
understand precisely because they're in classes.... all of that would
be DESTROYED.

... yeh?

by contrast: having a "convention" that has to be followed, the
*pre-existing* code where i had already (unintentionally) started
grouping inputs and outputs to the IEEE754 FPU stages into
hierarchical classes, just needs a one-line function, "return an
instance of the input class" and "return an instance of the output
class":

    def ispec(self): return FPNumBase2Ops(self.width, self.id_wid)
    def ospec(self): return FPSCData(self.width, self.id_wid)

so... i can't *use* the Record-based API... except by spending several
*days*, possibly a couple of *weeks*, working out and implementing a
strategy to convert all of those classes to flat Records, in
incremental steps that keep that extremely complex code running at all
times.

*or*... work out some way to add a function to every single one of
those data-classes that maps *onto* a flat Record structure.

... you see where that would lead?

> Related to that, I think that nmigen's Layout class (Record.layout) has a
> bad internal design since the shape of each member Signal is not normalized
> to a (width, sign) tuple, instead it is left as whatever the user passed
> in, requiring all code that uses it to manually handle the multiple
> different signal types (int and (int, bool)).

 which would be a really good reason not to rely on it as a core
fundamental basis of a Pipeline API.

> Other than allowing a Signal instead of a Record and having a Pipeline
> class which allows you to easily compose multiple Stage classes (both of
> which I intend to fix), I actually think the classes I designed have a
> cleaner interface:

 StageToSucc and StageToPred actually taking direct control over the
data and forcing it to be a Record-based format definitely isn't
cleaner.

 hmmm, something just occurred to me: the connect_to_records()
function is basically serving the exact same purpose as "eq()".
except that eq() handles Signals, lists/tuples of Signals, Records,
lists/tuples of Records, objects (which have to have their own eq()
function), lists/tuples of objects, and any *RECURSIVE* and
*UNLIMITED* permutation of those.

 my point there is: the plan that you're devising to add support for
Signal (lists of Signals) will require replacing connect_records with
connect_record_or_signals_list()...

 ... and once that's done, then adding support for objects (which
happen to have an eq() function) is no big deal...

... and once _that's_ done, you have *exactly the same capability as
eq() except that eq() already exists*.

end result: duplicated code.

one of the downsides of having worked with python for such a long time
is that i make these kinds of assessments very very quickly, and
compound them.  unfortunately that makes it hard for others to follow.

> - They allow for pipeline stages to have named input and output ports (just
> name the member something other than pred or succ) as opposed to multiple
> ports requiring an array (though this point won't apply after you finish
> the refactoring you had planned).

 yes, gone. didn't like the mess it made by having to support both
single and multi in one class.

> - They have the pipeline registers separated out from the pipeline logic,
> allowing RegStage, BreakReadyChainStage, and all similar classes to not
> have to handle also having additional user-specified logic internally.

 yes.  this can be taken a step further, resulting in *full* isolation
between Control (StageToPred/StageToSucc) and Data (Stages).

> - They have the input and output data shapes specified as parameters to
> __init__ allowing users to specify shapes without having to create a new
> class specifically to override the ishape and oshape members (can't
> remember if those are the right names).

 ispec and ospec.  and they're not over-ridden, they're *provided*.
they're abstract functions (where python doesn't actually need an
abstract base class *at all*: doing so is just a "courtesy" to
developers, to aid in code maintenance).

 the strict requirement of forcing people to specify data shapes via a
strict API is precisely what i would consider to be a severe
disadvantage, example described above.

 i could easily create a Records-base class that conforms to the Shape
API, which provides exactly the functionality you describe:

class RecordsConformingStage:
   def __init__(self, in_shape, out_shape, processfn): # TODO: setup function
      self.in_shape = in_shape
      self.out_shape = out_shape
      self.__process = process
   def ispec(self): return Record(self.in_shape)
   def ospec(self): return Record(self.out_shape)
   def process(seif, i): return self.__process(i)

err... that's it.  now the proposed API for the pipeline looks
*exactly* like the one that you've developed.  done, in 8 lines.

... except... it's done... and the API i've developed supports far,
far more than just a Records-based format.

also needs a setup function (optional), otherwise it won't be able to
support nmigen modules.  other than that, it's done.

now, here's the thing: can you see how... ah... "pointless" is a
strong word, but basically it's used correctly: can you see how
pointless that class is?  all it's doing is constructing a class
instance based around passing in a function and two parameters!

is it *really* so burdensome to do this instead (where you don't even
need an __init__)?

class RecordStage:
   def ispec(self): return Record(some_input_shape)
   def ospec(self): return Record(some_output_shape)
   def process(seif, i): return some_processing(i)

except, again, to reiterate: all it's doing is forcing a compliance
with a Records-based API (which is a strong *dis*advantage).

> If a class wants to have the shape
> fixed, it just has to override __init__ and call super().__init__() with
> the shapes it wants.

 except... those shapes cannot be nmigen modules, or objects, or
Signals, or Records, or lists of the same, or recursive applications
of all of those, can they?

 also, it's not possible to use static classes, is it?  how in the
Records-based API *modified or not to allow Signals) would it be
possible to do this?

class ExampleStage(Stage):
    def ispec():        return Signal(16, name="example_input_signal")
    def ospec():        return Signal(16, name="example_output_signal")
    def process(i):        return i + 1

> - They allow the process function to be passed in to CombStage as a
> parameter, allowing invoking code to use CombStage to handle some simple
> function without having to derive from it.

 that can be achieved just as well as by the above example, by
splitting out the processing into an external function that both use.
in fact, i did that accidentally in RecordStage above, by calling an
example function "some_processing()".

> If a derived class wants a fixed
> process function, it can override the process function in CombStage.

why?  why would a pipeline ever want to do that?  a pipeline has a
specific purpose, and CombStage does its job.  doing anything *other*
than that job would break user expectations of what a pipeline is and
does.

if it's going to step outside of those boundaries, in my view it would
be far, far better to have a separate class, and even a separate
python module.

this is why i created multiplpe.py: i don't want people to think that
the single-in/single-out pipe code has anything to do with multi-in /
multi-out (other than yes, they can be connected together of course).

> > i'm counting on you (and everyone else) to make sure that what i'm
> > doing - no, what *all* of us are doing - is sane and on track.
> >
> Makes sense. I'll try to do that.

 appreciated.

> > the other team spent months duplicating code that did 95% of the job
> > and would only have needed 5% adaptation.
>
> That makes sense. I'll admit that a while ago, I really didn't like using
> other's code, though that may be partially because combining C/C++ code is
> a pain because everyone uses a different mostly-incompatible build system

 that's different from taking copies of code and adapting it, changing
its purpose.  some more details on the example was referring to: it
was auto-generating of python bindings to webkit's DOM.  and by python
bindings, i *MEAN* full AND COMPLETE bindings to EVERY single part of
the webkit DOM, such that python became a **FULL** peer of javascript.

so just as there is a window object in javascript, there was now a
python window object.  *every* single function and property [which
people accidentally *believe* to be part of the javascript API: it's
not], had a one-for-one DIRECT and EQUAL available function in python.

that's three hundred and fifty types of objects

three THOUSAND functions.

and over TWENTY THOUSAND properties (such as width, height and so on).

how in god's name did i manage to write code in EIGHT DAYS that
covered such an insane amount of functionality?  it's very simple:

* i knew that all of the code was pre-specified in IDL files (350 of
them), which come directly from the HTML5 Specification.
* mozilla had their own IDL file format
* webkit had its own IDL file format
* mozilla however had a python-based application that auto-generated
an in-memory Abstract Syntax Tree.
* python-gobject happened to auto-generate python code from a
similar-looking Abstract Syntax Tree

therefore what i did was:

* lift the code from mozilla and modified it to understand *WEBKIT's*
IDL file-format
* took a copy of python-gobject and modified it to understand the
*MOZILLA* Abstract Syntax Tree.

then all i had to do was write code for a few data types (about 15 i
think: took like 2 of the 8 days), and the job was done.

the chromium team *literally* took months to do the same job, because
they chose to write their OWN IDL-to-AST compiler, they chose to write
their *OWN* code-generator, and on top of that they had to work out
how to auto-generate code that conformed to the webkit API, which was
in flux as they'd just forked the entire codebase.

that having been said: there's a certain point at which the complexity
of some code is so great that it's just not even possible to identify
whether it's convenient to adapt.

we had an example of this ourselves, when evaluating chisel3 and
rocket-chip.  the strong advantage of disregarding pre-existing code
is that the code *you* write is something that *you* can understand.

the trick is to not just write code that *you* understand, it's to
write code that *other people* can *also* understand.

> > haven't quite covered everything, i know i've missed things out...
> > getting too long... hope you understand.
> >
> Yeah, makes sense. Hopefully working with nlnet will work out which should
> relieve some of the financial pressure.

 indeed... however there we would have a couple of options:

 (1) disregard all possible sources of other code and rewrite
everything from scratch
 (2) work together on leaping ahead in large jumps through intelligent
code re-use and effective collaboration.

money basically empowers people to be "more of themselves".  google
spent something like USD $200 *MILLION* to *FAIL* to create a modular
smartphone (Google - originally Motorola - Project ARA).

they substituted financial brute-force for intelligent and efficient
use of creativity.

l.