[Libre-soc-dev] Task 939 and its subtasks

Wed Nov 23 11:15:58 GMT 2022

On Tue, Nov 22, 2022, 13:08 Dmitry Selyutin via Libre-soc-dev <
libre-soc-dev at lists.libre-soc.org> wrote:

> On Tue, Nov 22, 2022 at 11:37 PM lkcl <luke.leighton at gmail.com> wrote:
> >
> > On Tue, Nov 22, 2022 at 7:01 PM Dmitry Selyutin <ghostmansd at gmail.com>
> wrote:
> >>
> > > I'd take this with a grain of salt. :-)
> >
> > no, really:
>
> Please don't speak for everybody around. If it takes 45 minutes for
> you to change this code respectively -- this might be correct.
> Otherwise this is quite an exaggeration, starting from the whole ply
> concept.
>
>
> > if that takes more than 45 minutes then something is very very wrong.
>
> No, it's not. People are different. Some outperform in one area,
> others -- in different. This is perfectly normal.
>
>
> > from there, the confidence builds massively because all you have
> > to do is say, "NEXT".  pick any function (any one of the BNFs),
> > create associated unit tests, and it's done.
> >
> > with there being so few BNF expressions, it quickly becomes
> > "routine".
>
> Later -- yes. But not on the first iteration. Especially not for a
> newcomer.
>
>
> > the "gotchas" in c are that you have to declare variables.

waay more than that...you have to figure out what type those variables
should be, we need type deduction/propagation for that to work. we're
currently getting by by relying on python's total lack of static typing.

  well, just
> > accumulate those, drop them into a separate data structure during
> > the BNF walk, and now you have 2 data structures, one of which
> > may be post-process-substituted into the other.
> >
> > > I'd argue with MAINTAINABLE and SIMPLE as one who debugged and
> > > modified it, especially since you write it in CAPS.
> >
> > three other people have demonstrated self-learned capability to
> > both comprehend, debug, and modify the python-ply-based parser.
> > this is enough empirical evidence to support the hypothesis that the
> > code is both simple and also maintainable.
>

I would instead argue based on my personal experience and from what i've
heard people say that those 3 other people you are referring to were forced
into having to comprehend, debug, and modify the python parser because they
found the language implemented by it to have terrible user experience,
failing with indecipherable error messages for confusing reasons or
producing output that would fail when run, both of which could only be
fixed after extensive tinkering with the pseudo-code and then finally
having to fall back on debugging and bodging the parser themselves.

Any language that we want to be taken seriously (and what language could be
more qualified as needing to be taken seriously than the specification of
our processors?!) as a computer readable language should *ALMOST NEVER*
require debugging the parser in order to figure out how to write your code
such that it compiles successfully!

imho that is all symptoms of a parser with near zero error handling, no
type checking or inconsistency checking, with a common method of dealing
with syntax errors being to throw an AttributeError or something from where
the parser relied on some property of the input but never checked anything.
>From reading the parser's source, it is a partial implementation of a
python parser that is bodged into shape and barely works, needing
additional bodge passes over the token stream before being parsed.

It needs serious clean-up and to generate an actual AST for the pseudo-code
(not just python or c pseudo-code) allowing separation of concerns (a
*major* factor of maintainability) into separate passes over the AST:
parsing, variable/value type propagation/checking (necessary for C,
optional for python, currently missing completely), and C/Python code
generation.

Having separate passes over a AST also trivially allows fixing many
existing bugs that require looking backwards or forwards in the input code
in order to generate the output at the correct location such as the current
broken handling of variables first assigned in an if statement:
if ... then
    b[0:63] <- ...
else
    b[0:63] <- ...
generates:
if ...:
    b = <generate_a_64_bit_selectable_int>
    b[0:64] = ...
else:
    b[0:64] = ...
which is totally wrong

The currently existing code is in such bad shape due to the number of
bodges on bodges that imho it is more useful as a reference for the parsing
and code generation stages than as a code base to start from, because that
way you can actually end up with a clean architecture that is familiar to
anyone who has learned about compilers' internals, even slightly -- parsers
producing ASTs and running passes over the AST is like compilers 101.

This is by contrast with the current code that has the code generation
tightly coupled with parsing, making both code-generation and parsing waay
more confusing due to all the extraneous code, additionally the many bodges
in the parsing process make it even more confusing, rendering it nearly
opaque and unfamiliar even to people like me of a few years ago who can
read and quickly understand most new-to-me parsers and code generators,
largely because they use well-known compiler architecture that's taught in
basically every compilers course.

Therefore, even if you decide rewriting is too expensive, the parser needs
to at least be separated from code generation, inserting a pseudo-code AST
in between, and the parser bodges cleaned up.

This is a *new* AST, *don't* just use Python's AST, Python's AST has a huge
amount of stuff we don't need (making using it that much more complex) and
is missing the required flexibility to be extended by adding things
required by the pseudocode such as variables/expressions' types (needed for
the C backend, as well as vastly better error checking).

>
> This is not an argument. All these "three other people" could have
> understood this parser if it was written in Rust.

Because parsers and ASTs and code generators have such a well-known simple
structure, being written in Python, or C, or Rust, or Haskell (which I'm
not very familiar with but have seen some parsers in) doesn't affect their
understandability that much in my experience, all that does is change the
syntax used, the actual algorithms are basically unchanged, because the
features that make each language special are basically all unused or mostly
irrelevant at the algorithm-level for writing parsers and ASTs and code
generators.

Therefore, because Python is fast enough (once I had changed it to not
re-parse all of the csvs each time it parsed a single instruction's entry),
I think we should just write in Python to satisfy Luke's desire for it to
be in a language everyone knows, and to be able to use existing code to
satisfy Luke's desire to Not Rewrite the World.

Also, if we decide to rewrite the parser, imho lets *not* use
ply/bison/yacc, the code they generate is nearly completely
incomprehensible to anyone who isn't a Deep Compiler Nerd (since they're
table-driven, which debugs like a black-box -- this comes from someone who
is a Deep Compiler Nerd because I've written several parser generators and
compilers, even then table-driven parsers mostly debug like a black box),
we should use a recursive descent parser/parser-generator or a Packrat
parser generator, those are actually pretty easy to debug because parsing
is driven by nested function calls, ifs, and loops, which can be pretty
easily followed and generally do pretty trivially understandable things at
each point of the code.

recursive descent parsers are also quite easy to add special cases to,
which we have a *lot* of.

Does this prove
> anything? You could argue it would have taken more time, but I suspect
> not everybody would agree (though I certainly would).

I'd agree it would take more time for people unfamiliar with Rust, but
because of the simplicity of parsers and ASTs (it can be easily designed to
avoid unnecessary Rust language features and unusual libraries), I would
argue that it wouldn't be excessively more time.

Anyway, no,
> according to my particular understanding of maintainability, there's a
> lot to improve, same for simplicity. These both statements don't imply
> one should immediately rewrite it in Rust, obviously.

I'm not currently advocating for writing it in Rust. My preferences are:
if reusing existing parser code: Python (obviously).
if rewriting: C++ (most preferred), Python, C (least preferred, due to
total lack of hashtables and resizable arrays (like std::vector) without
needing external libraries (which are a total pain in C), those are both
quite important for compilers, just using fixed-size arrays and hoping they
won't overflow is imho terrible and short-sighted).

Jacob