[Libre-soc-bugs] [Bug 731] potential design oversight in Partitioned SimdSignal Cat/Assign/etc lhs/rhs

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Thu Oct 21 20:30:11 BST 2021


https://bugs.libre-soc.org/show_bug.cgi?id=731

--- Comment #20 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #8)
> > (In reply to Luke Kenneth Casson Leighton from comment #0)
> > > Strategy / Incremental work plan:
> > 
> > reordering a bit for clarity:
> > 
> > > 2) incremental step of adding "mirror (trmporary) storage classes"
> > 
> > SwizzledSimdValue *is* that temporary storage class, it is just a little
> > different than what you envisioned here:
> 
> unfortunately, it is much more than that.  it does that job *and* replaces
> other code which has unit tests done

Those unit tests can be reused...

> *and* is a regression on capability
> of existing code

how so? What can existing code do that SwizzledSimdValue can't (after Cat and
Slice are implemented in terms of it and it's todos are done)? As far as I can
tell, SwizzledSimdValue can do everything existing code can, and more.

> *and* is not a drop-in replacement for existing code.

I'm justifying that because I reasoned that the existing Cat code was
insufficient for LHS use, and because combining Cat and Slice (and possibly
Repl) all into one general class greatly simplifies the logic, as well as
generates far more optimal code (a requirement if we want our cpu to actually
be low-area/low-power when using simd alus; I don't want us to have to discard
all of SimdSignal as waay too inefficient).
For things like Cat(a[30:50][2:10:2], b, Repl(c[23], 5)):
SimdSignal will generate 8(!) layers of muxes (on the order of 2-4000 gates!),
whereas SwizzledSimdValue will generate only 1 layer (since that's what it
always generates) (I'd guess at most 500 gates, for a 64-bit output SimdSignal
with 4 possible elwid values -- generating a single 4-in 64-bit mux).

> 
> it is also an optimisation and it goes too far.
> 
> we absolutely have to be extremely strict about this and keep to
> an incremental development strategy at all times.
> 
> if that class is to be used it needs to be scheduled as its own
> incremental update with full unit tests and full functionality and
> redesigned as a dropin replacement.
> 
> given that the existing classes are functional

Which they're not, Cat doesn't work for nested use on lhs, and doesn't work
when the same Cat instance is used both on lhs and rhs, Slice isn't implemented
at all. All of those can be quickly implemented in terms of SwizzledSimdValue
(I already posted an implementation of Cat previously that handles *all* of
those requirements by virtue of letting *already implemented* SwizzledSimdValue
do all the heavy lifting).

Here's __Slice__ (SimdSignal method):
def __Slice__(self, start, stop):
    # convert to SwizzledSimdValue with same value
    lhs_v = SwizzledSimdValue.from_simd_signal(self)

    # basically a wrapper around elwid, SwizzleKey can probably
    # be fully replaced with PartType once it grows more methods
    kinda_elwid = lhs_v.swizzle_key
    lhs_layout = get_layout(lhs_v)
    layout = compute_slice_layout(lhs_v, start, stop)
    # dict of swizzles for each kinda_elwid value
    swizzles = {}
    for k in kinda_elwid.possible_values:
        # make `width` bit wide Swizzle initialized to const zeros
        swizzle = Swizzle.from_const(0, layout.width)
        # get lhs swizzle
        lhs = lhs_v.swizzles[k]
        for lhs_el, el in zip(lhs_layout.elements(k), layout.elements(k)):
            # Slice the list of bits for this element, and
            # assign to slice of target Swizzle.bits.
            bits = lhs.bits[lhs_el.slice][start:stop]
            swizzle.bits[el.slice] = bits

            # above should probably be refactored into using
            # Swizzle.__setitem__ method
        swizzles[k] = swizzle
    return SwizzledSimdValue(kinda_elwid, swizzles)


> this firmly places
> it into "not essential right now and in fact detrimental to urgent
> timescales" territory.

The above firmly places it into *faster* to working code territory, cuz I
already wrote basically all the code.

> which has been discussed and explained.
> 
> 
> > * use as rhs: its way of knowing when it's being used as the rhs is by
> > trapping other code's accesses of SwizzledSimdValue.sig, and lazily adding
> > the submodule that computes .sig's value. This way, no other code needs to
> > operate differently when using SwizzledSimdValue as a rhs, it looks like a
> > completely normal SimdSignal.
> > * use as lhs: it overrides __Assign__ so doesn't use PartitionedAssign,
> > instead returning a list of nmigen Assigns, therefore PartitionedAssign
> > doesn't need to be aware of SwizzledSimdValue at all.
> 
> if you had separated that out as an incremental update it could have
> been considered.

Well, luke, you have to realize that not all things can be implemented
incrementally, and that forcing them to occur incrementally when not suited to
that will take waay longer and the diffs will not actually make much sense.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list