[Libre-soc-bugs] [Bug 713] PartitionedSignal enhancement to add partition-context-aware lengths

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Thu Oct 14 22:05:24 BST 2021


--- Comment #115 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #110)
> (In reply to Jacob Lifshay from comment #87)
> > And I agree with her there, nmigen has always been scalar,
> not being funny:
> stating that you agree with whitequark is actually very dangerous
> for the project.  it will be used as ammunition to sabotage what we are
> doing.
> > where each Value
> > is a single value, not a list of values. Basically everybody expects
> > nmigen's Value to be a scalar. If you insist on rewriting nmigen to change
> > that fundamental assumption, you'll end up with a different language, 
> no, false.  did we design totally new Power ISA Scalar v3.0B because of
> SVP64?

nope, but SVP64 instructions *are* encoded differently (they have a svp64
prefix), which is just like the renaming i'm advocating for. By renaming, that
could be something like:
from simd_nmigen import Signal, Module, Value, Shape
> this is the exact same thing.
> the entirety of the nmigen language *becomes* vectorised...

This only works if *every* property of our new types that nmigen inspects
(including Shape.width and Signal's inputs being `Shape.cast`ed) behaves
exactly like how nmigen expects, which isn't the case currently since you
picked Shape.width to be full-simd-width instead of element width, among other

> > Well, I'm saying your 90% of the way there, but the last 10% of abstraction
> > is the harder part...
> it's actually been pretty easy, just a hell of a lot of work (and to
> be honest i am annoyed that, again, you haven't helped with Cat, Repl,
> Part, or any of the recent additions, which i asked you to help with)

well, actually, I wrote Cat and Repl implementations (in simd_signal), i think
they just got kinda ignored, or was written shortly after you wrote Cat and
Repl, icr.

> there's over THIRTY operators.  i really needed your help in implementing
> them and you have instead spent considerable time snd resources fighting
> every single step of the way to *not* help.  i do not know why.
> leaving that aside for you to think about

I'm trying to help, but part of helping is trying to get the API design correct
before we spend even more work baking an API into all our code that ends up
being changed because it didn't actually work as originally designed...
> there are two REALLY complex ones:
> * Array (and ArrayProxy).
> * ast.Switch aka SimdSignal.__Switch__.
> i did the analysis, under bug #458, and whoo, even just the analysis
> takes a hell of a lot of explaining.
> unfortunately because ast.Switch is the basis of m.If, m.FSM and
> m.Switch, i can't provide "full proof" until it's actually 100%
> implemented.

I said I would implement and I will...once we decide which API it'll be based
on (hence arguing in here about the API).

We're 99% of the way through deciding API.

While SimdSignal itself will probably be faster to write, actually *using* it,
as it was designed, would be waay more complex, because it does change the
semantics of how nmigen operates, such as requiring the shape input to Signal
be an int or dict of ints, whereas all our code expects to be able to pass
Shape-castable types to Signal, example:

If we add some lines to layout/SimdLayout, then it automatically handles *all*
Shape-castable inputs:

class SimdLayout(Shape):
    def __init__(self, lane_shapes=1, ...):
        # Signal() gives shape == unsigned(1), so we match that.
        # We use lane_shapes=signed(...) rather than a `signed` argument.
        if isinstance(lane_shapes, SimdLayout):
            # next two lines should probably be factored out into lane_shapes
            s = lane_shapes.signed
            lane_shapes = SimdMap.map(lambda v: Shape(v, s),
            lane_shapes = SimdMap.map(Shape.cast, lane_shapes)
        lane_widths = {}
        signed = None
        for i in vec_el_counts:
            if signed is None:
                signed = shape.signed
                assert signed == shape.signed
            lane_widths[k] = shape.width
        assert signed != None
        # rest of code uses lane_widths, not lane_shapes

> if you would like to rectify that you can help implement the
> __Slice__ override.

I can work on slicing.
> > it's easier and more consistent to just have a separate
> > API that mirrors nmigen's API, allowing unmodified nmigen to work with our
> > code, as well as making our code much clearer. This wouldn't require that
> > much effort, since we can use most of the existing PartitionedSignal code
> > with minor modifications and renaming.
> it results in a massive duplication and msintenance burden.  please
> read the RFC, i documented clearly why what you advocate would be
> disastrous in multiple ways.

yeah, now i can see how it could have a big maintenance burden, however imho
forking nmigen may have a bigger maintenance burden.

> last section:
> https://libre-soc.org/3d_gpu/architecture/dynamic_simd/

Replying to wiki concerns:

> All of these ideas, unfortunately, are extremely costly in many different
> ways:
> 1. Any overrides of any kind give a catastrophically fatal impression
>   that the high-level language behaviour might in some tiny way be
>   different,
>   purely because of the very *existence* of an override class.

I think that won't be much of a problem if we specifically state throughout our
docs that our API is intended to be equivalent to nmigen. This is similar to
how pypy provides extra features over cpython, yet implements the exact same
language. No one thinks that pypy's existence means that cpython is useless...

> 2. Proponents of such override mechanisms (and there have been many)
>   fundamentally misunderstand the distinction between Type 1 (AST)
>   low-level
>   and Type 2 (dsl.Module) high-level nmigen language constructs, and
>   miss the fact that dsl.Module (Type 2) is 100% implemented *in* AST
>   (Type 2) constructs.

you mean implemented in AST (Type 1)?

No, I understand your point about that completely, I just think that your point
isn't sufficient justification for permanently forking nmigen.

> 3. Wrapper classes are a maintenance nightmare. Unless included in the
>   library itself, any update to the library being wrapped is a huge risk
>   of breaking the wrapper.

well, if we design the wrappers such that we only use the public API of nmigen,
nmigen's API stability prevents us from needing to rewrite our code -- it is
the exact same stability that prevents *all other* nmigen users from needing to
rewrite their code when nmigen is updated. Wrappers, if properly designed,
don't ever go in and mess with nmigen's internals, they only provide an API
that looks like nmigen.

>  Long-term this is completely unsustainable.
>   Likewise monkey-patching.

Monkey-patching is messing with nmigen's internals and I never advocated for

> 4. Wrapper classes introduce huge performance degradation. Every function
>   requires one additional function call. Every object instantiated
>   requires
>   both the wrapper class to be instantiated and the object being wrapped,
>   every single time.

Is this really such a performance sink? if so, it makes me think using Python
was a bad choice...Rust has no issues with wrappers, in fact, it has a special
type layout specifically intended for wrappers: `#[repr(transparent)]`. Rust
totally lives up to the zero-cost abstractions promise.

>  A single wrapper class is never enough: the entire
>   class hierarchy, everything that is ever instantiated by a "wrapped"
>   class instance, also requires wrapping.

Well, since we're only using SIMD in ALUs, we only need to wrap nmigen's core
API, which is only 5 or 6 classes (ignoring the ones we're overriding anyway)
-- I did go through the whole public api of nmigen.hdl and check them all.
Class list (additional classes we'd need to override):
Value, Const, Statement, Assign, Module, and maybe Record

>  This quickly
>   gets completely out of hand and diverts developer time into a
>   nightmare time-sink that has actively harmful consequences *even
>   once completed*.
>   Memory requirements double; performance degrades.

I highly doubt memory requirements are anything to worry about, i'd expect it
to be on the order of 100MB or 1GB at the most.

>   Both are unacceptable.

I think that's exaggerating...

> 5. "Explicit coding" is actually more costly even than for-loop
>   code duplication.  It is "The Right Thing (tm)" in order to achieve
>   optimal gate-level design but requires hyper-aware and hyper-intelligent
>   developers, whom, if ever lost, take knowledge of the internal workings
>   of ultra-complex code with them.


> 6. "Low-level classes" have in fact already been implemented: over a
>   dozen modules utilised and combined behind PartitionedSignal exist
>   and are already in use. PartitionedMux was one of the very first
>   SIMD-aware
>   "Type 1" AST operators written.  The problem is that it's nowhere near
>   enough.  Conversion of 100,000 lines of readable nmigen HDL using
>   Type 2 (m.If/Elif) high-level constructs to **not** use those
>   constructs and return to AST Mux, Cat, and so on is a completely
>   unreasonable expectation.

Agreed, though I'll admit that converting everything to use Mux was my original
plan a year ago.

> 7. "Replacements for dsl.Module" aside from coming with the catastrophic
>   implication that they provide alternative behaviour from dsl.Module
>   are a heavy maintenance burden.  Not only that but there is the risk
>   that, unless actually included *in* nmigen, the underlying AST modules
>   that they use might change, or be deprecated, or have bugs fixed which
>   break the replacement.

I addressed those points above.

> 8. "Compilers". It is very well known that any python program that outputs
>   another python program as an ASCII output may then immediately read that
>   ASCII output from within the very program that created it, `eval` it,
>   and then execute it.  That being the case, you might as well skip the
>   output to ASCII and the eval part, and create classes that
>   dynamically and directly instantiate the desired target behaviour.
>   This saves months of effort but is a common mistake, frequently made.

well, compilers do have the benefit of being waay easier to debug, since you
can see their whole output...the alternative (just creating classes at runtime)
is waay more opaque.

Back to bugzilla message:

> > 
> > > > whereas
> > > > SimdSignal acts like a *list* of scalar values -- aka. a Vector.
> > > 
> > > yes.  and therefore the entire suite of operators of SimdSignal can be
> > > *defined* to act in list-form, at every single level in the entirety of ast.*
> > > operations.
> > 
> > I'm saying the existing design only does a 90% job, it covers list-form +,
> > -, *, &, ^, |, .eq, etc.
> > 
> > It does *not* cover Shape.width, Signal.like, etc.
> > 
> > I'm saying we should finish off getting to 100% of the API looking like a
> > transparently vectorized (aka SIMT) 
> SIMT i keep telling you is the wrong word to use.  you mean SIMD

I mean SIMT, not SIMD. I'm explicitly referring to what the code looks like
when compared to what SIMT/SIMD looks like from a SW programmer's perspective.

C++ SIMD code looks like this:
void f(f32x4 a, f32x4 b) {
    f32x4 c = a + b; // element-wise add -- looks same, good!
    static_assert(sizeof(a) == 16); // size of whole vector -- different, bad
    u32x4 bits = (u32x4)a; // bitcast -- different, bad
    // fp->int conversion -- different, bad
    u32x4 values = __builtin_convertvector(a, u32x4);
    i32x4 gt = a > b; // result is not bool, bad
    f32x4 max = (f32x4)((i32x4)a & gt | (i32x4)b & ~gt); // mux -- can't use
`if`, bad

Notice how all types are separate wide SIMD types and operations often do
something other than the element-wise version of what scalar code would do.

C++ SIMT code looks like this:
void f(float a, float b) {
    float c = a + b; // element-wise add
    static_assert(sizeof(a) == 4); // size of element
    uint32_t bits = bit_cast<uint32_t>(a); // bitcast
    uint32_t values = (uint32_t)a; // fp->int conversion
    bool gt = a > b;
    float max = b;
        max = a;

Notice how all types are element types and operations always operate
element-wise and look exactly like scalar code and how control-flow constructs
can be used.

> > version of nmigen
> > Shape/Signal/Value/etc.,
> jacob: please listen to what i am telling you.  what you envisage is EXACTLY
> what is being designed.


> > just with different names, since that will make it
> > much easier to use/understand. Note that Shape.width (lane width) not being
> > a plain integer comes from lanes being XLEN-width, not from being
> > vectorized. those are orthogonal unrelated changes. We could have a totally
> > scalar Signal/Value/etc. that are all XLEN-ified, that would require
> > modifying Shape.width to be multivalued.
> yes.  that is what lane_shapes is for.

I'm saying it's specifically Shape.width that should be multi-valued, because
Shape.width is part of the nmigen API so it should be the SIMT element type's
width, not the full-SIMD-vector's width.

lane_shapes, because it's not part of the nmigen API, can be whatever we

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-soc-bugs mailing list