[Libre-soc-dev] my current plan regarding simdsignal

lkcl luke.leighton at gmail.com
Fri Oct 22 12:05:13 BST 2021


On Fri, Oct 22, 2021 at 5:17 AM Jacob Lifshay <programmerjake at gmail.com> wrote:

> On Thu, Oct 21, 2021 at 6:03 PM lkcl <luke.leighton at gmail.com> wrote:
> > the concept of cascading changing PartitionPoints
> > (but that all react to the same elwidth or other spec) is making me pause as it wasn't part of the original planning.
>
> Well, that was always my plan once I started describing how fixed-size
> elements were needed several weeks ago.

except, i needed some "hooks" on which to understand it.  i can't take
in multiple concepts at once on something this complex.

> That's part of why I kept
> pushing for having a class (SimdMode) that holds elwid and
> vec_el_counts (or whatever equivalents we were discussing at the time)
> that is explicitly *separate* from SimdLayout and SimdSignal.

yehyeh.  i think (to-be-renamed) PartType/ElwidPartType serves that
purpose.

> > * do a wiki page which demonstrates and descrbes the problem
> >   (outline why partition padding is needed)
> >   use one of the existing ones as a template,
> >   shift, logicops, cat, you'll get the idea
>
> Done, though I forgot to use the other ones as a template...

doh :)  it's a little awkward to see-think-and-edit because it's
in HTML (all the others are markdown, meaning they're ASCII
readable by the person doing updates) but it's clear what's
going on.

i've just added a paragraph about the design implications.
the next step will be to add a simple example of say a
Repl() or a Cat()

(which is why i tended to keep the tables small - 32-bit,
only 3 initial partitions, because they expand out quite rapidly
in size when a 2nd operator is added:
https://libre-soc.org/3d_gpu/architecture/dynamic_simd/cat/
)

> > * raise a bugreport about it
> done: https://bugs.libre-soc.org/show_bug.cgi?id=733
> > and link it to the
> >    doc budget
>
> Couldn't figure out which bug you meant, so I'll leave that part to you.

384.  i changed the title because i meant "a bug specifically about the
documentation page", which will lead (prompt) a discussion and clear
thoughts, which in turn (and only then) lead to a bugreport outlining
the work needed to be done.  (not the other way round!)

we're running out of headroom on 384 though.

> Ok, while you're thinking, I'll add the SimdMode class, but won't
> modify the existing code yet.

my thoughts went something like this:

* this is the first SimdSignal that needs a fixed element width
  (arg)
* that then has to propagate right the way through the *entirety*
  of SimdSignals that use that output
  (arg! panic!)
* however... by creating a new PartType (ShapePartType?
  or just use ElwidPartType?)
  which is created from a layout() fixed_width=None and an
   appropriate lane_shapes
  (where lane_shapes contains all of the fixed element widths
   needed), there should be enough information.
  (panic starting to subside)
* in particular, note that the only difference between PartType and
  ElwidPartType is that ElwidPartType takes *subsets* of
  combinations of the mask, where PartType covers *all*
  (que?? explaaaaiiin por favor?)
* a converter can be written which takes all possible PartType
   mask combinations 0b000 0b001 .... 0b110 0b111 in the
   case of a 3-bit mask and creates a "one-to-one" not-exactly-fake
   vec_el_counts covering all options:
   { 0b000: errr no that can't be done because vec_el_counts
   is limited to one fixed witdh per elwidth.

so... PartType() can only be converted to other PartTypes()
and ElwidPartType can only be converted to other ElwidPartTypes

that i think is still ok... *if* the new PartType-instance-for-a-Slice-output
doesn't try to use the existing layout() function but instead has a different
layout() function which understands that it came from a PartType.

is that too much? is it getting too complex? i don't know the answer,
there.

basically, the last thing we need here is to throw away the stability
of Partitioned SimdSignal which has *only ever been tested with
PartType* so far.

but, we need to get "from where we are" to "where things need to be".

this in turn implies that ElwidthPartTypes needs to be finished
off as a top priority *before* proceeding with anything else,
and for it to be integrated as an option into test_partsig.py

that's a frickin lot of work due to the size of test_partsig.py
and the fact that there's now over 30 operators.  particularly
as the entirety of test_partsig.py is specifically designed
around a 3-bit mask.

sigh.

i don't know if it's better to create an entirely new unit test for
ElwidPartType (which given how much work it was fills me with
alarm) or whether to adapt it.

i *think* it might be able to just adapt it: the masks get longer
(and include elements that are "blank" or "padding" which would
default to zero) which *should* be okay.

also: given the "carrying of context" (PartType), the idea of throwing
out SimdSignal.ptype and making it a global is *definitely* a nonstarter.
it was already a non-starter due to being global (which is frowned
on in general, in python, so please don't start making design assumptions
that rely on any globals of any kind, please) but it's just not going to work.

next phase: evaluate whether the existing submodules will cope with
blank padding.  they should do, but it needs to be tested.

bottom line is that SimdSignal.__Shape__ needs to be shelved whilst
the entirety of the *rest* of SimdSignal needs to be checked (and adapted
if necessary) to ElwidPartType.

this *should* be fine because ultimately all the submodules rely on
PartitionPoints, PartitionPoints doesn't care about which partitions
are unused, and neither do any of the SimdSignal submodules.

[*an optimisation* - that can be added *later* - is to blank out inputs
in lanes that are entirely unused (which is what the blanking mask is for)
these can be done one at a time at a later date.]

as an example, PartitionedCat *should* be perfectly fine based on
PartitionPoints keys specifying the breakpoints:


https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_cat/cat.py;h=1f42d3f034e191f77dc79988b2c0e4449cd90a42;hb=98f9872f9521d3f2b86ba8b189b4bba1415c3eff#l95

  89         keys = [0] + list(x.partpoints.keys()) + [len(x.sig)]
  90         # get current index and increment it (for next Cat chunk)
  91         upto = y[idx]
  92         y[idx] += numparts
  93         print ("getting", idx, upto, numparts, keys, len(x.sig))
  94         # get the partition point as far as we are up to
  95         start = keys[upto]
  96         end = keys[upto+numparts]

what that's doing is: the y list contains where things got up to (upto, get it?)
in the Cat()ing.

it doesn't care where the PartitionPoint keys are (they can be
0 2 3 77 98 99 105), they dont' have to be uniformly-spaced,
at all.

therefore, if PartitionedCat is handed some PartitionPoints that
were generated from the layout() function (which in turn
came from an ElwidthPartType, which in turn came from
a Slice output) which contains a ton of paddings and some
exceptionally weirdly-spaced breakpoints, it doesn't care,
it just gets on with it.

likewise, if PartitionedAdd etc. etc. are passed exceptionally
weird PartitionPoints they genuinely don't care.

the only one that *might* be problematic is DynamicShift.

this is because, 18 months ago, we made the assumption that
the shift argument would always always come from one
partition and would never, under any circumstances, span
across a breakpoint.

overall though i have a reasonably high degree of confidence
that things should work out when using ElwidthPartType.

which means, first priority: ElwidthPartType needs to be
completed.

> Planned incremental PartType improvements:
> move elwid/mask to separate class (called SimdMode) (one instance is shared
> among all SimdSignals in an ALU),

hangon, hangon - why is that needed? what are the implications?
because that's such a massive intrusive change it could have far-reaching
implications.  it would be better to test the type and adapt the code:
class SimdSignal(UserValue):
    def __init__(self, mask, *args, src_loc_at=0, **kwargs):
        # create partition points
        if isinstance(mask, PartitionPoints):
            self.partpoints = mask
        elif isinstance(mask, SomethingElse):
            do something else, like set self.ptype = ElwidPartType(...)

which would leave existing functionality well alone, and keep unit tests
running so that it can be detected that no "damage" occurs.

l.



More information about the Libre-soc-dev mailing list