[Libre-soc-isa] [Bug 697] SVP64 Reduce Modes

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Wed Mar 23 16:51:49 GMT 2022


https://bugs.libre-soc.org/show_bug.cgi?id=697

--- Comment #21 from Jacob Lifshay <programmerjake at gmail.com> ---
imho moving is absolutely necessary if you want to have tree-reductions run
quickly without needing a full-crossbar somewhere on the ALU output -> other
ALU input path.

here, i define running quickly to mean not needing to delay extra clock cycles
because you're running register values through a separate slow-but-general
lane-crossing mechanism.

basically, a tree-reduction only needs to move data in a few set inter-lane
paths, but if you skip moving, then you need to be able to read from any
high-index element when combining with lower index elements.

e.g. if you have registers organized so every 8th element is matched with each
ALU, then tree-reduction with moving allows getting away with only having the
fast inter-lane paths for a few paths:

needed paths with moving (assuming reducing into r0):
ALU lane:                 0  1  2  3  4  5  6  7
                          |  |  |  |  |  |  |  |
r0.el0/r4.el0/r8.el0/...--X--|--|--|--|--|--|--|
r0.el1/r4.el1/r8.el1/...--X--X--|--|--|--|--|--|
r1.el0/r5.el0/r9.el0/...--X--|--X--|--|--|--|--|
r1.el1/r5.el1/r9.el1/...--|--|--X--X--|--|--|--|
r2.el0/r6.el0/r10.el0/...-X--|--|--|--X--|--|--|
r2.el1/r6.el1/r10.el1/...-|--|--|--|--X--X--|--|
r3.el0/r7.el0/r11.el0/...-|--|--|--|--X--|--X--|
r3.el1/r7.el1/r11.el1/...-|--|--|--|--|--|--X--X

needed paths with moving (reducing into any reg):
ALU lane:                 0  1  2  3  4  5  6  7
                          |  |  |  |  |  |  |  |
r0.el0/r4.el0/r8.el0/...--X--|--|--|--X--|--X--|
r0.el1/r4.el1/r8.el1/...--X--X--|--|--|--|--|--|
r1.el0/r5.el0/r9.el0/...--X--|--X--|--|--|--X--|
r1.el1/r5.el1/r9.el1/...--|--|--X--X--|--|--|--|
r2.el0/r6.el0/r10.el0/...-X--|--X--|--X--|--|--|
r2.el1/r6.el1/r10.el1/...-|--|--|--|--X--X--|--|
r3.el0/r7.el0/r11.el0/...-|--|--X--|--X--|--X--|
r3.el1/r7.el1/r11.el1/...-|--|--|--|--|--|--X--X


needed paths without moving and remapping instead (assuming reducing into r0):
ALU lane:                 0  1  2  3  4  5  6  7
                          |  |  |  |  |  |  |  |
r0.el0/r4.el0/r8.el0/...--X--X--X--X--X--X--X--X
r0.el1/r4.el1/r8.el1/...--X--X--X--X--X--X--X--X
r1.el0/r5.el0/r9.el0/...--X--X--X--X--X--X--X--X
r1.el1/r5.el1/r9.el1/...--X--X--X--X--X--X--X--X
r2.el0/r6.el0/r10.el0/...-X--X--X--X--X--X--X--X
r2.el1/r6.el1/r10.el1/...-X--X--X--X--X--X--X--X
r3.el0/r7.el0/r11.el0/...-X--X--X--X--X--X--X--X
r3.el1/r7.el1/r11.el1/...-X--X--X--X--X--X--X--X

the top right triangle of the remapping crossbar is needed for reductions with
more than 8 elements, since they start needing to read from registers r4..r7
and r8..r11 and ...

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libre-SOC-ISA mailing list