[Libre-soc-dev] SimdSignal scalar/vector switching and SimdShape.width

Fri Oct 29 03:43:51 BST 2021

On Thu, Oct 28, 2021, 01:13 lkcl <luke.leighton at gmail.com> wrote:

>
>
> On October 28, 2021 3:26:12 AM UTC, Jacob Lifshay <
> programmerjake at gmail.com> wrote:
>
> >That apparent unworkability is the result of still conflating two
> >unrelated
> >things that coincidentally have the same value: overall simd width, and
> >the
> >width of an element for elwid=64.
>
> in order to *make* downcasting work, i have *defined* the overall width to
> be exactly equal to the underlying signal width, including, obviously, any
> padding.
>

Yes, the rest of the cpu accesses the SIMD I/O ports of a SIMD ALU by
accessing: alu.whatever_io_port.sig (probably through a
SimdScope.get_underlying scalar/vector dispatch method):

alu.scope.get_underlying(alu.whatever_io_port) *always* returns the
underlying signal, wether in scalar or vector mode.

>
> when there is an elwid=0b00 which is the "request to treat the underlying
> sig member as a 1x 64 bit value" it is NOT necessarily the case that this
> is equal to len(self.sig) because there are other elwids that could be
> 2x36, 4x18, 8x9 etc.
>
> however when that occurs, it is the developer's responsibility to be aware
> of it.  they specified the vec_el_widths that way, they take responsibility
> for dealing with the fact that layout() will compute a much larger
> len(self.sig).  max(1*64, 2*36, 4*18, 8*9) i.e. 72 not 64.
>

Yes...all of that is stuff we agree on.

>
> you are overthinking this and trying to "protect" developers from
> themselves.
>

You have totally missed my point, which isn't about how the external cpu
connects to a SIMD ALU, but about SimdShape.width and how
s.Signal(my_simd_shape) works just fine with SimdShape.width being
multi-valued *instead* of being the width of the underlying signal.

>
>
> > In actuality, it works just fine for
> >SimdShape.width to be multi-valued,
>
> no, it is not, unfortunately, because that bleeds down into nmigen scalar
> behaviour when a developer accesses width assuming (quite reasonably) that
> it is Shape.width, because that is what it is: an integer.
>

well, all nmigen functions that take a Shape (there's only a few, most of
those we need to redirect anyway, such as Signal's constructor) should be
called through SimdScope methods, which extract the correct value for
scalar use.

If a developer accesses SimdShape.width, there are 3 cases:
1. they're doing arithmetic to build more SimdSignals or slice signals:
in this case, SimdShape.width *must* be multi-valued, otherwise they get
the wrong element widths. If SimdShape.width is the full simd width as you
want: they will only calculate the correct value for the elwid=64 case when
the values coincidentally match (not guaranteed), other element widths will
be totally wrong.

2. They're trying to run for loops or something more complicated: They need
to use more complex code to correctly adapt for changing XLEN, since using
the SimdShape.width you wanted, which is the full simd width, will only
give the correct result in very limited cases, the rest of the time it is
totally wrong.

3. They're trying to connect a SIMD ALU's IOs to the rest of the CPU:
accessing SimdShape.width is the wrong thing, they should be introspecting
the underlying Signal instead, or be using a SimdScope.full_width(shape)
method that properly selects between scalar signal shapes (which are the
elwid=64 *element width*, not the full simd width anyway) and full simd
width.

>
> SimdShape.width === Shape.width
>
> look at the constructor for SimdShape.

yeah, I know you wrote the code like that, I'm saying how you wrote it
isn't best and it should be changed to behave as-if your dealing with the
shape of an element (for all API common between SimdShape and Shape), not
the shape of a whole simd signal.

An additional benefit of routing the full-simd-width access logic through
either introspecting the underlying signal or through SimdScope.full_width
-- and also having SimdShape.width be the element widths is that it makes
it nearly trivial to make a 32-bit or a 128-bit, also scalar or vector ALU
merely by changing SimdScope's settings, all of our ALU code won't need to
change one bit.

Jacob