[Libre-soc-bugs] [Bug 238] POWER Compressed Formal Standard writeup

Mon Nov 30 08:21:50 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=238

--- Comment #113 from Alexandre Oliva <oliva at gnu.org> ---
actually, this thinking further of the detection of 10-bit insns,
mode-switching or not, led me to an interesting realization

our reasoning seems to be significantly affected from a perception distortion
out of the notion that one mode has 32-bit insns, while the other has 16-bit
insns

well, that's not true.  an instruction appearing while in uncompressed mode
requires us to look at 6 of its bits to tell whether the next instruction is at
the next half-word, or at the next word, and at another bit on top of them
(unless we move M as I suggested) to tell how to look at (i.e., what mode it's
in)

now let's change the perspective a bit.  instead of starting from the premise
that uncompressed insns are 32-bit and compressed ones are 16-bit, let's start
from a premise that is just as invalid, namely, that insns in either mode take
16-bits

in uncompressed mode, we look at the primary opcode to tell whether it really
is a 16-bit (or 10-bit, as we call them) instruction, or whether it extends
over 32, or even 64 bits!

but in compressed mode, we can't have even one bit or opcode to extend to tell
the insn isn't just 16-bits long, but rather 32, because...  somehow the
reasons that make it acceptable in one mode don't apply to the other, even if
we had to look at the very same 6 bits plus mode?!?

conversely, if looking at all those bits is so bad that it's unacceptable in
compressed mode, maybe it's just as bad in uncompressed mode and we should find
another way to go about it, one that doesn't take up two of the few major
opcodes.

consider, for inspiration, something that won't quite work because of the
64-bit instructions:

- 32-bit words may contain either a single 32-bit insn, or a pair of 16-bit
insns

- there's one 32-bit opcode, primary or extended, that encompasses:

-- 6+ bits to make it recognizable

-- 16-bits for a 16-bit insn

-- up to 10 bits that guide the decoding of *statically* subsequent insns

- there's also one 16-bit opcode that encodes:

-- 4+ bits to make it recognizable

-- up to 12 bits to guide the decoding of statically subsequent insns

these bits are similar in purpose to our current M and N, but they're packed
into an insn like the above, so that the decoder can readily tell, even ahead
of time, where insns are, and how to decode them

two possibilities of encoding of these bits readily occur to me:

- 1 bit per word, telling whether the word holds a 32-bit insn or a pair of
16-bit insns.  all 32-bit insns remain aligned.

- 1 bit per upcoming insn, telling whether it's a 16- or 32-bit insn

whether such bits, when present in an insn, queue up after bits that are
already there, or reset the queue and start affecting the next insn, would have
to be worked out.  there are upsides and downsides both ways: one favors
keeping a long queue so that alignments for more insns per cycle can be
precomputed; the other favors locality.  The latter, combined with a rule that
only the last insn in the queue may contain additional decoding bits.  If the
queue runs out, we return to traditional mode; ditto when we take a branch.

this enables us to use the full 16 bits of compressed insns, even in the insn
that starts pre-encoded-length mode (though that one still takes up 32 bits),
since the M and N bits of two consecutive insns are compressed into a single
bit or two elsewhere.

with 1 bit per word, the most compact encoding starts a sequence with a 32-bit
insn that carries 10 bits of pre-length of its own, plus 12-bits of pre-length
encoded in the 16-bit insn embedded in it.  then, it may go for 44 16-bit insns
before the queue runs out.

the sequence can be extended indefinitely with one pre-length compressed insn
at every 24 16-bit insns.  thus a compressed ratio limit compressed :
uncompressed is (24 * 2) : (23 * 4) = 52,17%.  in this scenario, each 16-bit
compressed insn payload takes up 16.6956 bits (24 insns to encode 23) in total,
an overhead of 4.34%.

Contrast with our attempt 1, that, in the same limit regime, uses 16 bits per
14-bit payload, an overhead of 14.28%

a 32-bit insn that fits a 16-bit encoding is a break-even point, and if there's
any pair of 16-bit insns within the 11 subsequent insns, it is advantageous to
go into pre-length encoding, and it remains advantageous if there are any other
such pairs every 13 insns

contrast with the attempt 1: a mode-switch is profitable only if it fits in
10-bits; if it fits only in 16-bit mode, it's break-even, and advantageous only
if there are other insns fitting 16-bit mode every other insn

the 1-bit-per-insn (vs -per-word) pre-length encoding is not quite as compact,
but it does away with the need for pairing up 16-bit insns, even if with
pre-length encoding ones, at the expense of misaligned 32-bit insns.  we can go
for 22 compressed insns before the initial max 22 bits run out.

the sequence can be extended indefinitely with one pre-length compressed insn
at every 12 16-bit insns.  thus a lower bound for the compression ratio is (12
* 2) : (11 * 4) = 54,54%.  in this scenario, each 16-bit compressed insn
payload takes up 17.4545 bits (12 insns to encode 11) in total, an overhead of
9.09%.

a 32-bit insn that fits a 16-bit encoding is a break-even point, and if there's
any 16-bit insn within the 10 subsequent insns, it is advantageous to go into
pre-length encoding, and it remains advantageous if there are any other such
insns every 12 insns

now, I said early on that these encoding schemes wouldn't work if we need to
support 64-bit insns, but they actually do, they just don't make the alignment
quite as simple, since any of the insns marked as traditionally-encoded might
actually take up 2 words, instead of just one.  it defeats part of the purpose
of the pre-length encoding, but maybe not so much as to render it useless

thoughts?

-- 
You are receiving this mail because:
You are on the CC list for the bug.