# [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops

lkcl luke.leighton at gmail.com
Wed Aug 18 23:16:20 BST 2021

```
On August 18, 2021 10:02:49 PM UTC, Richard Wilbur <richard.wilbur at gmail.com> wrote:
>On Aug 18, 2021, at 13:06, lkcl <luke.leighton at gmail.com> wrote:
>> basically, to do large DCT / FFT recursively, you split into two
>halves, do each half at half the DCT/FFT size, then recombine the
>results.
>
>Each half could use the same scalar coefficients.

could... but remember: FFT of size N you need N coefficients. now you can only hold in regfile half an FFT as if you did it with Vertical-First Mode

for DCT it is *N ln N* coefficients needed for a DCT of size N.  DCT of size 32 needs 32+16+8+4+2+1 registers for the COS coefficients!

we just used the ENTIRE regfile!

or...

you can use only 1/2 the regfile and do a 64-wide DCT

> Seems for a
>particular size data set that if we are doing recursive sizes of
>transforms to compute the transforms.  If they are always related by
>powers of two then one time calculating the coefficients should be
>sufficient if we could calculate them and store them either in the
>order they are used (in a non-destructive FIFO with capability to set a
>step size) or with an easy scheme to access them via an index, we might
>at once calculate the coefficients using our vector engine and then use

DCT unfortunately doesn't work that way.  in order to complete all butterflies you need, in each row, cos((i+0.5)/n) from i=0..n-1 where n goes up in powers of two per butterfly row.

you can share those values *in* a row but unlike an FFT you cannot *reuse* them on a *different* row due to the +0.5

>If we had such a coefficient cache, I think VFHint could still be
>useful.

interesting idea, to have a special separate cache for coefficients.  it is however pretty specialist.  if it really becomes really a focus for performance it's worth pursuing.

right now issuing cos instructions is "generic".  specialist single-purpose instructions make me twitchy.

for 3D texture interpolation it's fine / great / obvious payoff.

l.

```