[Libre-soc-dev] Compressed instructions (was: Re: [libre-soc-dev] Alex Oliva's intro, and RFC on mission)

Sat Nov 21 04:45:22 GMT 2020

On Nov 20, 2020, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:

> jumping straight into practical matters: we're in the middle of
> getting SimpleV redone on top of OpenPOWER, and a first step there is
> Compressed Instructions.

So, if I understand what I read in bug 238, we are looking into
introducing an extension to PowerISA for code compactness, a little like
Thumb vs ARM, with 16-bit instructions rather than 32-bit ones, so that
fragments of code that fit certain constraints, such as using only a
subset of the register file, sufficiently-narrow immediate operands and
offsets, and a limited set of operations, can be represented in such
shorter instructions, so as to save instruction cache, memory, bus
traffic, etc.

I get the idea that the transition between 32-bit instructions and
16-bit instructions is to be dynamic, rather than based on the
instruction encoding.  This raises various concerns to me, from a
toolchain engineer perspective.

One is how to mark fragments of code so that the tooling can tell
whether to emit or decode 16-bit or 32-bit instructions.  Say, how is
the disassembler supposed to tell whether it's looking at a 32-bit
instruction or a pair of 16-bit instructions?

Just to give an example of what I'd like to avoid, the SH port has
floating-point instructions that become single- or double-precision ones
depending on a bit dynamically set in a control register.  Just by
looking at the instruction, there's no way to tell whether it's single-
or double-precision.

Are sections to be marked as 32- or 16-bit code, so that transitions
between modes has to also jump from one section to the other?

Are labels to be marked, so that e.g. odd addresses are encoded in the
more compact mode?  Can calls and return insns transition between modes?
Can code at the same address ever be executed in both modes?  E.g., when
it comes to dynamic linkers, could there possibly be two GOT and PLT
entries for the same address, one for each mode, when labels refer to
the same address except for the mode?

Another concern is on tooling.  Though a compiler might not have too
much trouble to figure out that a chunk of code fits the constraints
that enable the use of the compact mode, that's not quite as useful or
involved as having the compiler try to make the code fit the
constraints.  Allocating registers for the constrained register profile,
reordering code so as to move unfit operations out of the fragment, or
falling back to alternate operations, scheduling instructions according
to the execution properties of each mode...  These don't seem to fit
very well in the current compilation model.

The way we've dealt with different execution modes in GCC is to have
entire translation units compiled targeting one mode or another.  In the
early Thumb days, it was even a separate target, so you needed two
separate compilers, one for ARM, one for Thumb.  They were merged into a
single compiler target eventually, but even then, you selected one mode
or the other in the command line.  Later on, you could select individual
functions for one mode or another.  With this, the compiler knew what it
was building for since early on.

Even when it comes to offloading, target allocation decisions are mostly
on a per-function basis.

This is very much unlike having the compiler figure out the transition
points between the different instruction encodings inside functions, and
then "recompile" fragments that are found to be fit for the compact
representation.  This is such a major undertaking, and an expectation
that a compiler will do so much work it hasn't traditionally done, that
it reminds me of the Itanium.

I don't wish to come across as too negative, but it makes me wonder if
the potential savings this feature will enable won't just go to waste
because the tools won't be able to take advantage of them.  I wonder if
it wouldn't make more sense to save this idea for a future development,
rather than in the critical path for the very first product.  I can see,
however, how much of a breakthrough it can be, and how compelling it can
make the processor, if the potential is realized.

> even though it is very early i'd like to look at how we can get
> statistical feedback in order to iterate on the encoding allocation.

I'll give that some thought and get back to you on this.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist         GNU Toolchain Engineer
        Vim, Vi, Voltei pro Emacs -- GNUlius Caesar