[Libre-soc-dev] liskov substitution principle and type assertions/annotations

Fri Aug 12 06:29:48 BST 2022

On Thu, Aug 11, 2022 at 11:51 AM lkcl <luke.leighton at gmail.com> wrote:
>
> Couple of things.
>
> Definition of LSP I saw, there is no inheritance tree, it is
> the API that defines the substitution principle. Creating
> Base classes *at all* and restricting types DIRECTLY and
> fundamentally interferes with this extremely fundamental
> principle on which python is fundamentally founded.

The solution to that is to realize:
1. the original statement of LSP is talking about subtypes in
    programming language theory and there are multiple ways to
    translate those programming language theory subtypes into Python:

    1. subclasses, in which case instanceof(v, BaseType) is correct.
    This is the definition of LSP I gave previously.

    2. an object's interface, in which case you can do things such
    as isinstance(v, collections.abc.Iterable) where every object
    with an __iter__ method counts as a subclass of Iterable, not
    just those that inherit from Iterable. You can easily define
    your own Iterable-style type-checking classes using
    typing.Protocol and @typing.runtime_checkable:
    https://docs.python.org/3/library/typing.html#typing.Protocol

    @typing.runtime_checkable
    class FooBar(typing.Protocol):
        def foo(self): ...
        def bar(self, v): ...

    class MyClass:
        def foo(self):
            print("foo")
        def bar(self, v):
            return v + 1

    assert isinstance(MyClass(), FooBar) # works

2. some interfaces should not support subtypes at all, e.g.
    python len() always returns an int no matter how much you
    like type substitution and think it's the
    One True Python Way. For interfaces like that,
    instanceof(v, TheType) is correct (or maybe even type(v) is TheType).

> Everyone who uses python well (instead of complaining) knows
> that float may be LSP substituted for int.

only one way, if it expects a float, you can pass an int. but if it
expects an int, passing a float usually won't work. e.g. passing an
int into math.sin works since it expects a float, passing a float into
bytearray() doesn't work, since it expects an int:

>>> bytearray(2.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot convert 'float' object to bytearray
>>> bytearray(2)
bytearray(b'\x00\x00')

>  You, Jacob, even
> developed a small function that could be used either with
> nmigen expressions or python numbers and if you had forced
> types upon it that would not have been possible.

No, it would have been perfectly possible, because the type assertions
that would be used wouldn't be isinstance(v, int), but instead:
@typing.runtime_checkable
class Addable(typing.Protocol):
    # isn't in python's standard library, but should probably be
    def __add__(self, other): ...
assert isinstance(v, Addable)
>
> Nobody who uses python a lot gives a shit about how far down
> the stack an Exception occurs. They want as little code in
> as compact a form written quickly and readable quickly and
> GENUINELY do not give a f*** about an int being accidentally
> put as a float and an Exception occurring later rather than
> earlier.

The issue is not as much an exception occurring later in a callee, but
the code giving you a completely indecipherable error, or, worse yet,
silently wrong results.

> Good python programmers know to look out for little
> things like that, fix the problem and move on.
>
> To say just because one person made a mistake and a float
> got into the system therefore EVERY SINGLE F*****G FUNCTION
> MUST without fail check every goddamn type,

I didn't say that iirc (or at least I didn't intend to say that). I
meant that there should be more than zero checks in the entire code
base.

> this is complete
> overkill, a total overreaction,

true, because you're attacking a strawman. I'm not advocating for
complete overkill, I'm advocating for more safety checks than walking
right against the edge of a cliff with no safety rail out in the
middle of nowhere in the middle of winter in a snowstorm, where, if
someone fell off, they'd be lucky to be found and buried by next
spring.

> that compromises the power,
> flexibility, compactness and pragmatism of python, and results
> in vastly more code that 1 has to be written and 2 has to be
> read.
>
> If we truly wanted to piss about with types then we would
> have started the entire HDL from scratch in c++ or probably
> bluespec BSV.

We picked python because it's popular, not because it has poor type
safety and, partially because of that, allows you to easily call stuff
where it will mysteriously misbehave.

> Python is *designed* to be compact and *allow* flexibility.

Yes, and that works great when you need a 100 line program, not a
100kloc program.

> Therefore please, *really*, stop putting types into the HDL,
> and please stop putting type check assertions into functions.
> Especially because your insistence on doing so, in combination
> with the use of overly long function, class, and parameter
> names, is resulting often in a massive 100% increase in code
> size.
>
> I routinely remove types that you insist on adding,
> and fit what was formerly a four to six line function
> definition onto a *single line* still under 80 Chars.

Do remember that compactness doesn't mean it's better, otherwise
everyone would program in APL. Leaving important details out (such as
the types it was originally designed for) just makes it harder to
understand, even if it takes 1/3 as many lines and you, who already
knows what the algorithm does partially because you saw the
descriptive names and types, thinks that whatever stuff is left is
sufficient to aid understanding.
>
>
> The only really appropriate place to use typechecking is when
> a function arguments may be multiple different types. This
> can be risky if the range of types was not predicted well enough
> but it should be documented in the function docstring.
>
> The appropriate place to declare an *advisory* on the expected
> types for parameters is in the docstrings.

Python has designed the perfect place to put an advisory on the
expected types in a machine-checkable location: type annotations. imho
us deciding to not use them has a large negative effect on reliability
of our code, which has been subject to massive code rot.

> We have unfortunately
> been lax on that and it will come back and bite us.

it has already done so.
>
>
> By complete contrast the use of types in the Formal Correctness
> additions to nmigen was wholly appropriate.  There, a level
> of strictness due to the mathematical foundations gives a
> reassurance that people will not f*** up when using it, and
> also when you were writing it.

imho it has nearly nothing to do with mathematical foundations, but
has to do with matching smtlib2's AST/semantics.

> It is a weird contrast that on one hand type checking is totally
> appropriate but in another context it just gets fundamentally
> in the way. it is what it is.

imho what gets fundamentally in the way is the *lack* of
machine-readable type information in our HDL -- type information that
allows type checkers to find many bugs as well as make it much faster
to write code, due to auto-completion and go-to-definition features in
any decent text editor/IDE.

type assertions are mostly a kind of kludge version of
machine-readable type annotations.

Also, it is important to remember that not all assertions are checking
types, many of them are instead checking properties of the algorithms
used that are assumed to be true, having assertions means that you can
catch bugs immediately when they arise, rather than *only* in whatever
unit tests are written, since unit tests are often lacking and don't
cover everything code is actually used for.

In fact, >70% (I counted) of the assertions you unceremoniously
removed in the commit I want reverted are checking algorithm
properties rather than only types:
https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=d72880215d54123709c8c02b86446b5a752c27c9
This is *HIGHLY* significant.

Jacob