[Libre-soc-dev] Draft Simple-V roadmap for Power ISA (was: [PATCH v3 0/6] ppc/svp64: support SVP64 and its first insns)

Fri Jun 24 12:38:39 BST 2022

On Thu, Jun 23, 2022 at 8:46 PM Dmitry Selyutin <ghostmansd at gmail.com> wrote:

> Hi folks, many thanks for your tips, suggestions and ideas on
> improvements!

it's greatly appreciated, everyone, you as well, Dmitry.

just so everyone knows, the bulk of the work for binutils, adding
Draft Cray-style Scalable Vectors to the Power ISA is, astoundingly,
pretty much done.

there are *NO* actual Vector instructions in SV. we will NOT be
submitting 200-5,000 Vector opcodes as would normally be done
in any other Scalable Vector ISA: ppc64-opc.c contains the
*entirety* of the Vector "contextualisation" of *pre-existing* Scalar
instructions [9]

context and roadmap:

* Simple-V is named "simple" because it adheres to a strict
  RISC paradigm [extended into the Scalable Vector space]
* there are only 5 actual "management" instructions:
   setvl, svstep, svremap, svshape, svindex (TODO [0])
   these last three are for hardware-controlled "Structure Packing"
   such as Matrices and other Dimensional shuffling, and
   full triple-loop DCT/FFT (normally only found in VLIW DSPs)
* there is "borrowing" of 25% of the EXT001 64-bit prefix
   space which gives 24 bits to "categorise" every Scalar
   instruction, according to their register profile [1]

as new *Scalar* instructions get added to the Power ISA,
then if it is appropriate to do so [8] they would correspondingly
have to be run through  the "register profile analysis" [2] and,
for sanity's sake, the ppc64-opc.[ch] auto-generator re-run [3].

now, we *also* happen to be developing some Scalar instructions.
it's really important to emphasise that these have absolutely nothing
to do with SV, at all.

these Scalar instructions are designed to bring the Scalar Power
ISA up-to-date in many areas outside of its primary focus and perfectly
reasonable and understandable use-case to date [IBM's high-end
customers]. example: i'm currently designing a bitmanip-mask instruction
which covers the entirety of BMI and TBM [4] *and* RVV's vsbfm suite.
none of these were needed for any IBM workloads / customers so
it is perfectly reasonable that they were never considered.

there's also a pair of biginteger math operations, a variant of
the intel "mulx" instruction is one of them [5].  the majority
of the list is on the bitmanip page [6], these will take some
time simply because there's a lot of them (appx... 80-100)

still on the TODO list:

* macro support (including the "8" of element-width=8, sorry Dmitry!) [7]
* svindex for doing vector-looped GPR[RT] = GPR(GPR(RA)) [0]
* submit scalar instructions [6] and corresponding ppc64-opc.[ch] [2][3]

that's basically it.  there's no binutils-level subsetting of SVP64 because
the lower SV Compliancy Levels require soft-emulation through illegal
instruction traps.  there's no Vector instructions to add: everything
Scalable-Vectorised is in the 24-bit Prefix.

overall, then, the strict RISC paradigm creates one hell of a lot less
work for everyone, yet brings something mind-melting like 2 million
intrinsics to the Power ISA. which is only manageable by sticking
strictly to RISC principles.

l.

[0] https://bugs.libre-soc.org/show_bug.cgi?id=867

[1] fascinatingly this approach was exactly the one that Peter Hsu
     and his team at MIPS, when they were developing the R8000,
     came up with around 1995.

[2] https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_analysis.py;hb=HEAD
[3] https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_binutils.py;hb=HEAD

[4] https://bugs.libre-soc.org/show_bug.cgi?id=865#c1

[5] https://libre-soc.org/openpower/sv/biginteger/

[6] https://libre-soc.org/openpower/sv/bitmanip/

[7] https://bugs.libre-soc.org/show_bug.cgi?id=849

[8] new Scalar instructions have to make sense in a Vector context
     "scalar===element" before they can be Prefixed to extend to multiple
    elements. mtmsr doesn't qualify for example because there's only
    ever going to be one MSR.  sc makes no sense, but weirdly td/tw
    tdi/twi do.

[9] we tried breaking the rule of adding Vector opcodes without having
     the corresponding identical Scalar instruction: it went very badly.
     lesson learned.