[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops
lkcl
luke.leighton at gmail.com
Thu Aug 19 11:43:22 BST 2021
On August 19, 2021 1:46:59 AM UTC, Richard Wilbur <richard.wilbur at gmail.com> wrote:
>
>> On Aug 18, 2021, at 18:13, lkcl <luke.leighton at gmail.com> wrote:
>> 8 4 2 1 on batch sizes and num coefficients
>> 1 2 4 8 times reuse on coefficients
>
>Is this for FFT?
no, DCT. N ln N coefficients actually N/2 ln N
> Very cool, I suspected it would be pretty good reuse.
it's not.
RADIX2 FFT on the other hand there are N coefficients and you *can* reuse them, for row 2 you jump every other coefficient for all 2 sub-crossbars, for row 3 you jump every 4th coefficient for sub-sub-crossbars but they are the same N coefficients.
for DCT the same thing happens as far as jumpingbis concerned and in-row reusr but because of the i+0.5 they are NOT THE SAME IN EACH ROW
>I wasn’t specific enough when I asked, “How much coefficient reuse in a
>particular row?” I meant to ask concerning the DCT since it isn’t an
>option to share coefficients between rows in that algorithm.
and i answered as per your question.
>Except that the input numbers are rationals with a common denominator
>for a particular row in DCT. I think we could effectively store them
>with a particular structure based on the denominator, indexed with the
>integer count along the row. More of a coefficient array/RAM than
>cache (your usage of this term was more loaded than mine, I simply was
>referring to a convenient place to stow the numbers where we could
>easily and quickly get them back when needed).
indeed... at the cost of designing and adding yet more instructions, this time with an extremely small probability that they can be put to use elsewhere.
the butterfly REMAP schedule is generic and i can foresee it being used elsewhere.
>Did you mean to describe the case where the matrix is square?
as a special case yes although implementations i've seen try to do at least one dimension as power-two then use bernstein convolution for the other.
even power 2 you may end up with e.g. 128 (2^7) which is an odd power 2 i.e. not square breaks into 2^3 x 2^4
l.
More information about the Libre-soc-dev
mailing list