[Libre-soc-dev] Task 939 and its subtasks

Hendrik Boom hendrik at topoi.pooq.com
Wed Nov 23 13:11:06 GMT 2022


On Wed, Nov 23, 2022 at 03:15:58AM -0800, Jacob Lifshay via Libre-soc-dev wrote:
> On Tue, Nov 22, 2022, 13:08 Dmitry Selyutin via Libre-soc-dev <
> libre-soc-dev at lists.libre-soc.org> wrote:
> 
> > On Tue, Nov 22, 2022 at 11:37 PM lkcl <luke.leighton at gmail.com> wrote:
> > >
> > > On Tue, Nov 22, 2022 at 7:01 PM Dmitry Selyutin <ghostmansd at gmail.com>
> > wrote:
> > >>
> > > > I'd take this with a grain of salt. :-)
> > >
> > > no, really:
> >
> > Please don't speak for everybody around. If it takes 45 minutes for
> > you to change this code respectively -- this might be correct.
> > Otherwise this is quite an exaggeration, starting from the whole ply
> > concept.
> >
> >
> > > if that takes more than 45 minutes then something is very very wrong.
> >
> > No, it's not. People are different. Some outperform in one area,
> > others -- in different. This is perfectly normal.
> >
> >
> > > from there, the confidence builds massively because all you have
> > > to do is say, "NEXT".  pick any function (any one of the BNFs),
> > > create associated unit tests, and it's done.
> > >
> > > with there being so few BNF expressions, it quickly becomes
> > > "routine".
> >
> > Later -- yes. But not on the first iteration. Especially not for a
> > newcomer.
> >
> >
> > > the "gotchas" in c are that you have to declare variables.
> 
> 
> waay more than that...you have to figure out what type those variables
> should be, we need type deduction/propagation for that to work. we're
> currently getting by by relying on python's total lack of static typing.
> 
>   well, just
> > > accumulate those, drop them into a separate data structure during
> > > the BNF walk, and now you have 2 data structures, one of which
> > > may be post-process-substituted into the other.
> > >
> > > > I'd argue with MAINTAINABLE and SIMPLE as one who debugged and
> > > > modified it, especially since you write it in CAPS.
> > >
> > > three other people have demonstrated self-learned capability to
> > > both comprehend, debug, and modify the python-ply-based parser.
> > > this is enough empirical evidence to support the hypothesis that the
> > > code is both simple and also maintainable.
> >
> 
> I would instead argue based on my personal experience and from what i've
> heard people say that those 3 other people you are referring to were forced
> into having to comprehend, debug, and modify the python parser because they
> found the language implemented by it to have terrible user experience,
> failing with indecipherable error messages for confusing reasons or
> producing output that would fail when run, both of which could only be
> fixed after extensive tinkering with the pseudo-code and then finally
> having to fall back on debugging and bodging the parser themselves.
> 
> Any language that we want to be taken seriously (and what language could be
> more qualified as needing to be taken seriously than the specification of
> our processors?!) as a computer readable language should *ALMOST NEVER*
> require debugging the parser in order to figure out how to write your code
> such that it compiles successfully!
> 
> imho that is all symptoms of a parser with near zero error handling, no
> type checking or inconsistency checking, with a common method of dealing
> with syntax errors being to throw an AttributeError or something from where
> the parser relied on some property of the input but never checked anything.
> From reading the parser's source, it is a partial implementation of a
> python parser that is bodged into shape and barely works, needing
> additional bodge passes over the token stream before being parsed.
> 
> It needs serious clean-up and to generate an actual AST for the pseudo-code
> (not just python or c pseudo-code) allowing separation of concerns (a
> *major* factor of maintainability) into separate passes over the AST:
> parsing, variable/value type propagation/checking (necessary for C,
> optional for python, currently missing completely), and C/Python code
> generation.
> 
> Having separate passes over a AST also trivially allows fixing many
> existing bugs that require looking backwards or forwards in the input code
> in order to generate the output at the correct location such as the current
> broken handling of variables first assigned in an if statement:
> if ... then
>     b[0:63] <- ...
> else
>     b[0:63] <- ...
> generates:
> if ...:
>     b = <generate_a_64_bit_selectable_int>
>     b[0:64] = ...
> else:
>     b[0:64] = ...
> which is totally wrong
> 
> The currently existing code is in such bad shape due to the number of
> bodges on bodges that imho it is more useful as a reference for the parsing
> and code generation stages than as a code base to start from, because that
> way you can actually end up with a clean architecture that is familiar to
> anyone who has learned about compilers' internals, even slightly -- parsers
> producing ASTs and running passes over the AST is like compilers 101.
> 
> This is by contrast with the current code that has the code generation
> tightly coupled with parsing, making both code-generation and parsing waay
> more confusing due to all the extraneous code, additionally the many bodges
> in the parsing process make it even more confusing, rendering it nearly
> opaque and unfamiliar even to people like me of a few years ago who can
> read and quickly understand most new-to-me parsers and code generators,
> largely because they use well-known compiler architecture that's taught in
> basically every compilers course.
> 
> Therefore, even if you decide rewriting is too expensive, the parser needs
> to at least be separated from code generation, inserting a pseudo-code AST
> in between, and the parser bodges cleaned up.
> 
> This is a *new* AST, *don't* just use Python's AST, Python's AST has a huge
> amount of stuff we don't need (making using it that much more complex) and
> is missing the required flexibility to be extended by adding things
> required by the pseudocode such as variables/expressions' types (needed for
> the C backend, as well as vastly better error checking).
> 
> >
> > This is not an argument. All these "three other people" could have
> > understood this parser if it was written in Rust.
> 
> 
> Because parsers and ASTs and code generators have such a well-known simple
> structure, being written in Python, or C, or Rust, or Haskell (which I'm
> not very familiar with but have seen some parsers in) doesn't affect their
> understandability that much in my experience, all that does is change the
> syntax used, the actual algorithms are basically unchanged, because the
> features that make each language special are basically all unused or mostly
> irrelevant at the algorithm-level for writing parsers and ASTs and code
> generators.
> 
> Therefore, because Python is fast enough (once I had changed it to not
> re-parse all of the csvs each time it parsed a single instruction's entry),
> I think we should just write in Python to satisfy Luke's desire for it to
> be in a language everyone knows, and to be able to use existing code to
> satisfy Luke's desire to Not Rewrite the World.
> 
> Also, if we decide to rewrite the parser, imho lets *not* use
> ply/bison/yacc, the code they generate is nearly completely
> incomprehensible to anyone who isn't a Deep Compiler Nerd (since they're
> table-driven, which debugs like a black-box -- this comes from someone who
> is a Deep Compiler Nerd because I've written several parser generators and
> compilers, even then table-driven parsers mostly debug like a black box),
> we should use a recursive descent parser/parser-generator or a Packrat
> parser generator, those are actually pretty easy to debug because parsing
> is driven by nested function calls, ifs, and loops, which can be pretty
> easily followed and generally do pretty trivially understandable things at
> each point of the code.
> 
> recursive descent parsers are also quite easy to add special cases to,
> which we have a *lot* of.
> 
> Does this prove
> > anything? You could argue it would have taken more time, but I suspect
> > not everybody would agree (though I certainly would).
> 
> 
> I'd agree it would take more time for people unfamiliar with Rust, but
> because of the simplicity of parsers and ASTs (it can be easily designed to
> avoid unnecessary Rust language features and unusual libraries), I would
> argue that it wouldn't be excessively more time.
> 
> Anyway, no,
> > according to my particular understanding of maintainability, there's a
> > lot to improve, same for simplicity. These both statements don't imply
> > one should immediately rewrite it in Rust, obviously.
> 
> 
> I'm not currently advocating for writing it in Rust. My preferences are:
> if reusing existing parser code: Python (obviously).
> if rewriting: C++ (most preferred), Python, C (least preferred, due to
> total lack of hashtables and resizable arrays (like std::vector) without
> needing external libraries (which are a total pain in C), those are both
> quite important for compilers, just using fixed-size arrays and hoping they
> won't overflow is imho terrible and short-sighted).
> 

I'd like to have an informal look at this grammar, its implementation, and any specifications and test cases that are avaiable.  Where should I look for it.

-- hendrik



More information about the Libre-soc-dev mailing list