[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops
lkcl
luke.leighton at gmail.com
Thu Aug 19 01:19:58 BST 2021
On August 18, 2021 11:31:24 PM UTC, Jacob Lifshay <programmerjake at gmail.com> wrote:
>well, you still need the registers
register singular. quantity ONE.
> for cos coefficients if you either
>load
>them from memory or if you compute them with a cos instruction...
to reiterate what i've said throughout the whole thread, many times:
* one register (QTY ONE) for the cos coefficient in VF Mode
vs
* (N ln N) registers for HF Mode DCT COS coeffs.
this because in Horizontal Mode the *entire* triple-loop butterfly is computed in one single instruction, and there is no other option but to have the entire coefficient set in regs [it is possible to do one row at a time but please let's not complicate the discussion]
breakdown:
* cost in registers and memory for HF variant:
- N ln N registers for cos coefficients
- N registers for input
- N ln N LDs of coefficients from memory
- N LDs for input
- N STs for output
total:
- N + (N ln N) regs
- 3N + (N ln N) memory accesses
* cost in regs and mem for VF:
- ONE scalar reg for cos coeff
- N regs for input
- ZERO LDs for coeffs
- N LDs for input
- N STs for output
total:
- N + 1 regs
- 2N memory accesses
it is therefore blindingly obvious that when COS can be done efficiently in hardware that it significantly reduces resource utilisation to use VF Mode.
this is the total opposite of "normal" processors which often don't even have a hardware COS instruction and consequently the cost of calculating COS far exceeds even the worst strip-mining scenarios.
l.
More information about the Libre-soc-dev
mailing list