[Libre-soc-bugs] [Bug 713] PartitionedSignal enhancement to add partition-context-aware lengths

Tue Oct 12 19:51:20 BST 2021

https://bugs.libre-soc.org/show_bug.cgi?id=713

--- Comment #82 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #73)
> i think we're slowly getting to the bottom of where the assumptions
> are that caused you to believe that it is impossible to use nmigen
> with such tiny modifications (and to believe that it is undesirable
> to do so).
> 
> it revolves around elwidths and the definition of Shape.width.
> 
> the past 18 months work have been fundamentally based on the assumption
> that it is, by design, possible to fit.  much of that time was spent
> verifying that that was possible.
> 
> the assumption / definition and all code is based on:
> 
> Shape.width == SimdShape.width == len(SimdSignal._its_internalsignal)

well, the problem is that, to correctly maintain the appearance of operating on
*scalar* values needed for all our existing OpenPower ALUs (waay more code than
PartitionedSignal), we need it to look like a single element, *not* a list.
This means SimdSignal.width *has to be* the width of a *single* element.
> 
> we need - and other users will expect - that Casting will "just work".
> SimdSignal should be castable to Signal and all bits *directly*
> copied (including unused padding partitions) seamlessly at the bitlevel.

we need Signal/SimdSignal interconversion, but it should be done via having
non-SIMD code access SimdSignal.sig, or by a .bitcast member function, not by
just doing Signal.eq. In particular, what your advocating for (for SimdSignal
to transparently inter-convert with Signal), just 1:1 copying bits, will *not
work* nicely -- it's in conflict with what happens when a scalar is converted
to a vector, which is that that scalar is *splatted* into all lanes of the
vector, not that the scalar is split into pieces and each piece becomes a lane.

Example (addi):
class MyAddI(...):
    def __init__(self):
        with simd_scope(self, create_elwid=True):
            self.RT = SimdSignal(XLEN) # output: vector
            self.RA = SimdSignal(XLEN) # input: vector
            self.Imm = Signal(signed(16)) # input: *scalar*, since it's not a
register
    def elaborate(self, platform):
        m = Module()
        with simd_scope(self, m=m):
            sliced_imm = SimdSignal(XLEN)
            m.d.comb += sliced_imm.eq(self.Imm) # slice off bits that don't fit
in XLEN
            m.d.comb += self.RT.eq(self.RA + sliced_imm)
        return m

In order for the semantics to be correct, the conversion from self.Imm to a
SimdSignal has to put *all* of Imm's value into each lane -- *instead of what
you are advocating for* -- what you are advocating for would split Imm into
2-bit/4-bit/8-bit/16-bit pieces and put each of those pieces into each
corresponding lane, which is exactly what a 1:1 bit copy does, which is
*totally wrong* here.

Example (with elwid == 16-bits):
RA = 0x0123_4567_89AB_CDEF
instruction:
addi rt, ra, 0x7531

Right way (splatting):
RA (lanes): [0xCDEF, 0x89AB, 0x4567, 0x0123]
Imm: 0x7531
sliced_imm (lanes): [0x7531, 0x7531, 0x7531, 0x7531]
RT (lanes): [0xCDEF + 0x7531, 0x89AB + 0x7531, 0x4567 + 0x7531, 0x0123 +
0x7531] ==
    [0x4320, 0xFEDC, 0xBA98, 0x7654]
RT = 0x7654_BA98_FEDC_4320

Wrong way (1:1 bit conversion -- convert imm to 16-bit wide simd then sign
extend):
RA (lanes): [0xCDEF, 0x89AB, 0x4567, 0x0123]
Imm: 0x7531
sliced_imm (1:1 converted into lanes): [0x1, 0x3, 0x5, 0x7]
RT (lanes): [0xCDEF + 0x1, 0x89AB + 0x3, 0x4567 + 0x5, 0x0123 + 0x7] ==
    [0xCDF0, 0x89AE, 0x456C, 0x012A]
RT = 0x012A_456C_89AE_CDF0

Another wrong way (1:1 bit conversion -- sign extend imm to simd width then
convert):
RA (lanes): [0xCDEF, 0x89AB, 0x4567, 0x0123]
Imm: 0x7531
sliced_imm (1:1 converted into lanes, only lower 16-bits of full 64-bits had
anything):
    [0x1357, 0x0, 0x0, 0x0]
RT (lanes): [0xCDEF + 0x1357, 0x89AB + 0x0, 0x4567 + 0x0, 0x0123 + 0x0] ==
    [0x7654, 0x89AB, 0x4567, 0x0123]
RT = 0x0123_4567_89AB_7654

> this *is* how nmigen works.  all Value-derivatives *are* copyable
> from one to the other by making the fundamental assumption that
> when converted to bitlevel they are all effectively "the same".

Well, SimdSignal is *fundamentally* incompatible with nmigen Value/Signal,
since nmigen expects Values to act like a single scalar value, whereas
SimdSignal acts like a *list* of scalar values -- aka. a Vector.

-- 
You are receiving this mail because:
You are on the CC list for the bug.