[Libre-soc-dev] Compressed instructions
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sat Nov 21 15:56:43 GMT 2020
On 11/21/20, Alexandre Oliva <oliva at gnu.org> wrote:
> Some more thoughts on this issue...
it's addictive, isn't it? :)
> Maybe having the compiler default to generating compressed code, if
> targeting a CPU that supports it, and relaxing the requirements only
> upon finding e.g. too much register pressure, would make the problem
> more approachable.
i believe the algorithms in RVC gcc to be simpler than that (although
likely being advanced): the target regs of C are those most commonly
used, and non-targetted ones simply automatically result in the use of
32 bit instructions.
this is why RVC added LD/ST-stack opcodes which when the 32bit pattern
"LD RT, NN(r1)" is spotted it is naively replaced with "c.ld RT NN".
basically, "get the job done, get some benefits, optimise later"
> OTOH, I realize that a static choice of opcodes for this compressed
> encoding, aiming to use 50%-smaller instructions as often as possible,
> can make for code that compresses down to no less than 50% of the
> original size, probably more like between 60% to 70% for any nontrivial
> program.
pretty much on the nail there. however the effect is a *square* law
effect on power reduction that cascades through L1 to L2/L3, reduces
TLB and MMU thrashing, and more.
approximately: a reduction in code size of 25% is a whopping *40*%
reduction in power consumption.
> Data compression algorithms, in turn, can shrink an executable, such as
> the ppc-gcc I used to test the 'objdump -M histogram' patch, down to
> some 30% of the original size. Filesystems with transparent compression
> and MMUs can enable required pages to be decompressed on demand. That
> doesn't help with the instruction cache,
L1 CAMs are one of the bigger consumers of power. that and regfiles.
ironic that moving data uses more power than actual computation.
> but it does help even better
> than opcode compression in reducing permanent or volatile storage
> requirements.
the effect of latency on instruction and data compression should be
self-evident: efforts to add transparent L1/L2 cache compression at
the hardware level are complex enough to have been the subject of
entire dissertations. we need to be pragmatic.
> marketplace of CPUs. Is the data that pointed us this way readily
> available for my perusal, so that I can convince myself by studying it?
RISC-V RVC and other studies. if you happen to find them please do
link them on the C wiki page.
l.
More information about the Libre-soc-dev
mailing list