[Libre-soc-bugs] [Bug 558] gcc SV intrinsics concept

Thu Jan 14 14:22:07 GMT 2021

https://bugs.libre-soc.org/show_bug.cgi?id=558

--- Comment #58 from Alexandre Oliva <oliva at libre-soc.org> ---
I haven't been able to keep up with this in detail (sorry, my attention has
been temporarily diverted), but I'm a little worried about how to represent a
"shuffled" CR register file map, if I get the right idea of what's being
proposed.

The key concepts that GCC deals with for purposes of register allocation are
requirements of instructions (constraints, as in extended asms) and modes
(closely related with types).

CRs and flags in general are dealt with without caring about their internal
representation.  They're abstracted into different CCmodes; on machines in
which different insns can output a different set of compare results, e.g. only
EQ/NE, or LT/EQ/GT, LT/EQ/GT/UN, or even carry, overflow, underflow, exceptions
and whatnot, those are represented as different CCmodes, when applicable, in an
analogous way to integral of floating-point modes, in which wider ones can
carry more information, be it precision/mantissa, be it exponent.

It's all modelled abstractly, as if the condition code register held the result
of the compare rather than whatever bits the underlying hardware uses, and
then, when the conditions get to be used, the mnemonics are selected based on
the kind of compare result we're interested in, and the compiler remains
blissfully unaware of the condition register internal representation.

So you won't see anything in GCC that cares that CRs are 4-bits wide and use
one bit for EQ, one for LT, one for GT, and one for UN, in whatever order that
is.  This solves some potential problems for us, because endianness of those
bits is not an issue.  

There's nothing in the IR that enables reinterpretation of CR bits as an
integral quantity, or vice-versa.  Indeed, CCmodes generally do not pass the
TARGET_MODES_TIEABLE_P predicate with other modes, meaning you cannot
reinterpret a CCmode "quantity" in a register as another mode, as you often can
reinterpret a wide integral mode as a narrow one, and vice-versa, when the
machine, the ABI and the compiler keep them extended under uniform conventions.

Now, the problem with "shuffled" register ordering is that the controls GCC
uses to tell how modes and registers related are TARGET_HARD_REGNO_MODE_OK,
that tells whether a quantity in a given machine mode can be held in a given
register, and TARGET_HARD_REGNO_NREGS, that tells how many *consecutive*
registers are needed to hold that mode, starting at a given register.

In order for wider-than-register modes to be held in a set of registers, those
registers *have* to be contiguous in GCC's internal notion of the register
file.  It is sometimes the case that the contiguity is not relevant for the
architecture, e.g., if there isn't any opcode that operates on pairs of
registers holding a double-word value, but these often appear when a pair of
consecutive registers holds a double-precision floating-point value, or a
widening multiply necessarily sets a pair of neighbor registers.  When this
happens, the order of registers in the abstract register file in the compiler
has to match the order and the grouping required by the machine, otherwise the
allocation won't get things right.

When it comes to vectors of gprs and fprs, we didn't have the problem I'm
concerned about: the vector modes can just require N contiguous registers, and
since they appear as neighbors in the abstract register file, that works just
fine.  Unlike other wide types, the WORDS_BIG_ENDIAN predicate doesn't affect
the expected significance of partial values split across multiple registers in
vector types, so we're fine in this regard.

However, if there are opcodes that require different groupings or orderings of
CRs, there will be a representation problem.  E.g., if we need CR12 to be right
next to CR4 because of some opcode that takes a pair of CRs by naming CR4 and
affecting CR4 and CR12 as a V2CC quantity, they'd have to be neighbors for this
V2CC allocation to be possible.  But if in other circumstances we use say a
V8CC quantity starting at CR0 to refer to CR0..7's 32 bits, then those 8 CRs
would have to be consecutive in the register file, without room for CR12 after
CR4.

So please be careful with creative register ordering, to avoid creating
configurations that may end up impossible to represent without major surgery in
the compiler.

Also, keep in mind that, even if some configurations might be possible to
represent with the knobs I mentioned above, the rs6000/powerpc port has a huge
legacy of variants, so whatever we come up with sort of has to fit in with
*all* that legacy.  E.g., IIRC 32-bit ppc variants have long used consecutive
32-bit FPRs for (float+float) double-precision-ish values, and consecutive
32-bit GPRs to hold 64-bit values.  There were ABI requirements to that effect,
that required the abstract register file in the compiler, and also that in
debug information, to use the register ordering implied by the architecture. 
If we were to require the introduction of intervening registers, for purposes
of vectorization, between registers that such old arches need as neighbors,
insurmountable conflicts will arise.

-- 
You are receiving this mail because:
You are on the CC list for the bug.