[Libre-soc-bugs] [Bug 713] PartitionedSignal enhancement to add partition-context-aware lengths

Thu Oct 7 20:37:13 BST 2021

https://bugs.libre-soc.org/show_bug.cgi?id=713

--- Comment #28 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #27)
> 2) we make the tough decision to break the rule "no changes to any
>    pipeline code" and AVOID the need entirely.

Well, we'll need to break that rule anyway for FP pipelines, which isn't much
of an additional problem since they still all need modification anyway to add
F16/BF16 and to handle flags/exceptions.

Idea: have a XLEN global that is an instance of a new class SimdMap (a
generalization of SimdLayout's dict handling code to arbitrary values per
lane-kind -- in this case arbitrary integers) and override *, +, -, etc. so it
can do arithmetic like is done in the spec pseudo-code, ending up with a dict
wrapped by SimdMap that can be passed into SimdLayout, bit-slicing, shifting,
constants, etc. so code could look like this:
# XLEN global constant definition
XLEN = SimdMap({ElWid.I8: 8, ElWid.I16: 16, ElWid.I32: 32, ElWid.I64: 64})

# example definition for addg6s, basically directly
# translating pseudo-code to nmigen+simd.
# intentionally not using standard ALU interface, for ease of exposition:
class AddG6s(Elaboratable):
    def __init__(self):
        with simd_scope(self, IntElWid, make_elwid_attr=True):
            self.RA = PartitionedSignal(XLEN)
            self.RB = PartitionedSignal(XLEN)
            self.RT = PartitionedSignal(XLEN)

    def elaborate(self, platform):
        m = Module()
        with simd_scope(self, IntElWid, m=m):
            wide_RA = PartitionedSignal(unsigned(4 + XLEN))
            wide_RB = PartitionedSignal(unsigned(4 + XLEN))
            sum = PartitionedSignal(unsigned(4 + XLEN))
            carries = PartitionedSignal(unsigned(4 + XLEN))
            ones = PartitionedSignal(XLEN)
            nibbles_need_sixes = PartitionedSignal(XLEN)
            z4 = Const(0, 4)
            m.d.comb += [
                wide_RA.eq(Cat(self.RA, z4)),
                wide_RB.eq(Cat(self.RB, z4)),
                sum.eq(wide_RA + wide_RB),
                carries.eq(sum ^ wide_RA ^ wide_RB),
                ones.eq(Repl(Const(1, 4), XLEN // 4)),
                nibbles_need_sixes.eq(~carries[0:XLEN-1] & ones),
                self.RT.eq(nibbles_need_sixes * 6),
            ]
        return m

> 
>    (making sure in the process that at every single step of those
>     changes that the code still operates when PSpec "mode==scalar,
>     please use Signal not PartitionedSignal throughout the entire
>     ALU codebase" is set)

well, when pspec==scalar, all that needs to happen is part_sizes is set to
{FpElWid.F64: 1} or {IntElWid.I64: 1}, then only the 64-bit entry in dicts are
used and we produce something that should be 100% the same circuit as scalar
(except for some names and module boundaries). no need to have everything
replace itself with Signal.
> 
> bottom line this is all about making the bare minimum code-changes,
> across what is already an extremely complex set of inter-connected
> parts.

-- 
You are receiving this mail because:
You are on the CC list for the bug.