[Libre-soc-bugs] [Bug 713] PartitionedSignal enhancement to add partition-context-aware lengths

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Sun Oct 3 13:35:22 BST 2021


https://bugs.libre-soc.org/show_bug.cgi?id=713

Luke Kenneth Casson Leighton <lkcl at lkcl.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |programmerjake at gmail.com

--- Comment #2 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
in the existing (regularly-allocated, full-allocated) design the partitions
do not need special treatment.  within a given partition the partial result
is always computed, and from that an optimisation has been possible to deploy
at the gate level which a naive "switch" on the mask could not possibly
hope to replicate.

a variable-length per-partition breaks that... *unless...

a pre-analysis phase is carried out which splits out the high-level "required"
partitioning from a *lower-level* partitioning.

settings:

* all 32 and 16-bit values are actually to be truncated to 11 bit
* all 8-bit values to 5-bit

from these we can write out:

      31         16 15      8 7     0
32bit                   |10 ....    0 |
16bit     26 ... 16 |   |10 ....    0 |
8bit  | 28.24| 20.16|  12..8 |  4.. 0 |

thus, we deduce, we *actually* need breakpoints at:

        28 26 24  16   12  8    4   0

and when 32bit is set:

* 10 is closed
* 11-31 "masked to zero" (clock gated, in future)

when 16bit:

* 10 is closed
* 16 is closed
* 26 is closed
* 10-15 is zeromasked / clock-gated
* 27-31 is zeromasked / clockgated

when 8bit:

* 4 is closed
* 8 is closed
* 12 is closed
* 16 is closed
* 24 is closed
* 28 is closed

which is a lot more partitions, and a lot more work, but provides an
implementation *in terms of existing code* with very little disruption.

it's also very straightforward.

reconstruction of results is also easily done by leveraging the existing
optimised code: it is just that the partition sizes are non-uniform lengths.

some care needs to be taken to ensure that the inputs are not too
heavily MUXed, creating long gate delays.

at the moment all inputs are unconditional, the selection is done on
the output.

Cat will need some more thought, to see if the irregular size can be dealt
with.

in particular, construction of the "size spec" for the output may be a bit
hairy.  each possible length combination under each partition mask value
will have to be calculated.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list