[Libre-soc-bugs] [Bug 713] PartitionedSignal enhancement to add partition-context-aware lengths

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Wed Oct 6 19:32:52 BST 2021


https://bugs.libre-soc.org/show_bug.cgi?id=713

--- Comment #16 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #14)
> (In reply to Jacob Lifshay from comment #10)
> > (In reply to Luke Kenneth Casson Leighton from comment #9)
> 
> > > this suggests a dict as the spec.  mantissa:
> > > 
> > > { 0b00 : (64, 53),  # FP64
> > >   0b01 : (32, 23),  # FP32x2
> > >   0b10 : (16, 10),  # FP16
> > >   0b11 : (16, 5),  # BF16
> > > }
> > > 
> > > no types, no classes.
> > 
> > Oh, what I have as types that can be SimdLayout.cast (like nmigen
> > Shape.cast) are 1 of 3 options:
> 
> 3 options when one will do and can be covered by a dict
> is massive complication and overengineering.

internally, it always uses a dict mapping abstract-lane-sizes (pow2 integers)
to nmigen Shape instances. the other options (only as inputs to
SimdLayout.cast/.__init__) are there because it saves a bunch of code on the
caller's side, making it much easier to read, and because accepting a Shape
directly (option 3) instead of always requiring a dict makes it waay easier to
use all our existing non-simd code mostly unmodified.
> 
> the *actual* needs come from a 2-bit elwidth, where at each
> elwidth i would have said even the power-2 is implicitly
> understood, if it wasn't for the fact that in FP we specify
> BF16 as one of the options.

elwidth ends up filling the SimdPartMode.part_starts bits (like original
PartitionPoints, except indexing abstract parts rather than bits), SimdLayout
uses those Signals to select which possible lanes are enabled, based on the
internal dict mapping abstract-lane-sizes to nmigen Shape instances.

The current SimdPartMode/SimdLayout design still supports doing stuff like 4xu8
1xu32 all simultaneously, because, iirc, we are still planning on supporting
packing ops into simd alus at the 32-bit level, and we could have a 4xu8 op
followed by a 1xu32 op. I initially thought about having an enum as the "mask"
signal (like your proposing here) instead of abstracted partition-points, but I
rejected that idea because of the sheer number of enumerants needed to express
the desired 4xu8 with 1xu32 combinations.

> int elwidths:
> 
> 0b00 64
> 0b01 32
> 0b10 16
> 0b11 8
> 
> FP:
> 
> 0b00 FP64
> 0b01 FP32
> 0b10 FP16
> 0b11 BF16 16 bit not 8

FP16/BF16 -- didn't think of that, will require either signalling outside of
SimdLayout or redesigning SimdLayout to cope.
> 
> that puts the "aligned power 2" width (or the number of power2 partitions)
> on the requirements, and it can be the first of the tuple under each key.
> 
> the second requirement is the *useful* width at each elwidth, nonpow2
> sized.
> 
> there are no other requirements, because supporting different signed/unsigned
> is out of the question.

I partially disagree, uniform signed/unsigned is needed and supported by
SimdLayout by having input lanes' types be nmigen Shape (or castable via
Shape.cast). non-uniform is unnecessarily complicated and SimdLayout will raise
AssertionError for that case.

> > 1. dict-like types that map lane-size-in-abstract-parts to Shape-castable
> > values:
> > This (after Shape.cast-ing all dict values) is the canonical internal
> > representation of a layout (stored in self.lane_shapes).
> > example:
> > { # keys are always powers of 2
> >     1: 5, # 1-part lanes are u5
> >     2: unsigned(3), # 2-part lanes are u3
> >     4: range(3, 25), # 4-part lanes are u5 since that fits 24
> >     8: MyEnum, # 8-part lanes are u10, assuming MyEnum fits in u10
> > }
> 
> we need neither Enums nor range, and *definitely* signed/unsigned is
> out of the question.

all of those are converted to Shape instances by nmigen's Shape.cast (called by
SimdLayout's constructor). once constructed, the internal fields use only
nmigen Shape instances.

Signed/Unsigned is needed because we need to support signed/unsigned multiply,
signed/unsigned compare, signed/unsigned divide (as a SIMD ALU, not as a
PartitionedSignal op), signed/unsigned right shift, etc.

Oh, i just realized ALU-level signed/unsigned (separate from lane-level
signedness) is another thing that needs to go in the SimdPartMode key, along
with F16/BF16/etc.

so the key would be like:
class MyIntKey(Enum):
    U8 = ...
    I8 = ...
    U16 = ...
    I16 = ...
    U32 = ...
    I32 = ...
    U64 = ...
    I64 = ...

INT_WIDTH_IN_PARTS = {
    MyIntKey.U8: 1,
    MyIntKey.I8: 1,
    MyIntKey.U16: 2,
    MyIntKey.I16: 2,
    MyIntKey.U32: 4,
    MyIntKey.I32: 4,
    MyIntKey.U64: 8,
    MyIntKey.I64: 8,
}

class MyFpKey(Enum):
    F16 = ...
    BF16 = ...
    F32 = ...
    F64 = ...

FP_WIDTH_IN_PARTS = {
    MyFpKey.F16: 1,
    MyFpKey.BF16: 1,
    MyFpKey.F32: 2,
    MyFpKey.F64: 4,
}

> even if using this type of specification, how does it relate to
> elwidths?

elwidths determine which lane-size-in-abstract-parts is used.
> 
> > or:
> > { # keys are always powers of 2
> >     1: signed(1), # 1-part lanes are i1
> >     2: signed(3), # 2-part lanes are i3
> >     4: range(-30, 25), # 4-part lanes are i6 since that fits -30
> >     8: MySignedBoolEnum, # 8-part lanes are i1
> >     16: signed(0), # 16-part lanes are i0, zero-bit shapes are supported
> > }
> 
> no.  range enum and signed...  arrgh, these are *subtypes*? 

no, they're just types that Shape.cast supports converting to Shape.
> 
> no, absolutely not.  no way. this is far too advanced, far too complicated.

it's easy, we let Shape.cast do all the hard work, SimdLayout just passes the
inputs to nmigen Shape.cast.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list