[Libre-soc-dev] [RFC] REMAP, SHAPE, complex (full) FFT
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sun Jul 11 12:02:53 BST 2021
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Sun, Jul 11, 2021 at 6:44 AM Lauri Kasanen <cand at gmx.com> wrote:
>
> Hi Luke,
>
> Just a warning, what you're showing here is dangerously close to what
> you described earlier as "10 hours per line of asm". The complexity is
> reaching high heaven.
if it wasn't for this massive list just on DCT alone i would have
stopped this exercise weeks ago
https://en.wikipedia.org/wiki/Discrete_cosine_transform#Applications
*10 speech encoding algorithms
* 14 general audio encoding algorithmms
* 13 video encoding algorithms
* 6 image compression standards
for RADIX-2 DFT and DCT it's only ever going to be around 7 lines of
assembler (with fixed tables).
for FFT:
* i expect it to be 25 lines of assembler that would normally
(commercially) go in an app note (see TI ref below),
* that app note needs writing *once*
* it's hidden in the hardware
* REMAP has been planned for 2 years,
* and it's exactly what high-end DSPs do:
* Qualcom's Hexagon DSP (used in signal processing)
https://en.wikipedia.org/wiki/Qualcomm_Hexagon#Code_sample
the inner loop is 1 VLIW instruction.
* TMS320C80 DSP, DIF FFT
https://www.ti.com/lit/an/spra152/spra152.pdf
equation 17 and 18 can be done in 1 clock cycle,
Zero-Overhead Loops apply to get the inner loop 100% pipelined.
* ZOLC Hardware-looping, Nikolaus Kavvadias
https://opencores.org/websvn/filedetails?repname=hwlu&path=%2Fhwlu%2Ftrunk%2Fdoc%2Fhwlu_spec.pdf
this was actually commercially implemented in actual hardware
by ST Microelectronics.
> It may still be worth doing,
that's the point of the exercise, to see if it is indeed worth it.
i'm about 30% the
way through things at the moment, so i haven't enough pieces to spot
"patterns" that can bring down code-size.
> not saying that, but it will highly limit
> who can (and will) write such a fft for SV. fftw is a popular lib yes,
> but each of the specialized math libs will have their own.
there's nothing stopping people from *not* using these
optimisations: the standard Power ISA code-paths and
ISA will not be "punished" by the addition of REMAP and
twin +/- MADD and ADD operations.
they'll just end up with an "industry-standard-normal" amount
of hard-coded assembler (macros that get unrolled to hundreds
of instructions).
bottom line is, the bang-per-buck ratio for this is so high, and its
applications so important in computer science *and* it's a "general"
abstraction, that i feel it's worth continuing.
remember: the abstractions (app notes) only need be written once.
l.
More information about the Libre-soc-dev
mailing list