# [Libre-soc-dev] MP3 DCT36

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sat Jun 19 18:15:53 BST 2021

```there's more.  these two differ only by an offset of 8 on buf and win:

t0 = s0 + s1;
t1 = s0 - s1;
out[(9 + j) * SBLIMIT] = MULH3(t1, win[     9 + j], 1) + buf[4*(9 + j)];
out[(8 - j) * SBLIMIT] = MULH3(t1, win[     8 - j], 1) + buf[4*(8 - j)];
buf[4 * ( 9 + j     )] = MULH3(t0, win[MDCT_BUF_SIZE/2 + 9 + j], 1);
buf[4 * ( 8 - j     )] = MULH3(t0, win[MDCT_BUF_SIZE/2 + 8 - j], 1);

t0 = s2 + s3;
t1 = s2 - s3;
out[(9 + 8 - j) * SBLIMIT] = MULH3(t1, win[     9 + 8 - j], 1) +
buf[4*(9 + 8 - j)];
out[         j  * SBLIMIT] = MULH3(t1, win[             j], 1) +
buf[4*(        j)];
buf[4 * ( 9 + 8 - j     )] = MULH3(t0, win[MDCT_BUF_SIZE/2 + 9 + 8 - j], 1);
buf[4 * (         j     )] = MULH3(t0, win[MDCT_BUF_SIZE/2         + j], 1);

therefore, it should be possible to:

* have a preparatory phase at length 5 which prepares T0 and T1 as *vectors*
* put t0/t1 into lower and upper halves of T0 and T1.
* first half using s0/s1
* second half, offset by 5, with s2/s3
* have a 2nd loop which is DOUBLE the size (VL=10)

element numbered 4 and 9 would be predicated out, with a mask
0b0111101111 instead of the previous idea, which is half the size (and
has the duplicated batch of code)

one of those MUL3s would use mrr (reverse gear), because, luckily, all
the indices are all reverse numbered. actually, err if they're not
overlapping, they don't need reverse gear at all. correction there.

l.

```