[Libre-soc-dev] Simple-V svp64 draft spec
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sat Jan 23 22:09:03 GMT 2021
we've a draft of the SVP64 encoding completed, which introduces not
just vector-predication for OpenPOWER (a feature entirely missing from
VSX and extremely challenging if not impractical to retro-fit), but a
whole stack of modern Vector features last seen in Cray supercomputers
(reintroduced with RVV and SVE2) and a whole stack of innovations not
seen before in any Vector ISA, ever. these include:
* Twin-predication (effectively an ordered sequential multiple VINSERT)
* Predicate-result (turns every single arithmetic operation into a type of cmp)
* Data-dependent fail-on-first (SVE2 and RVV only have LD/ST fail-first)
i'll be speaking about this at FOSDEM2021 through a rapid-fire
"overview" which, if you'd like to read it in advance, it's here:
https://libre-soc.org/openpower/sv/overview/
https://fosdem.org/2021/schedule/event/the_libresoc_project_simple_v_vectorisation/
the actual encoding requires some context to understand (the overview, above):
https://libre-soc.org/openpower/sv/svp64/
i've kicked things off with a python program that can "understand"
SV-augmented assembly opcodes and turn them into an EXT001-plus-v3.0B
64-bit encoding:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/sv/trans/svp64.py;hb=HEAD
we will be following a step-by-step implementation procedure which is
documented as-it-happens, here:
https://libre-soc.org/openpower/sv/implementation/
the idea being for that to become a page where other implementors can
obtain information and guidance about how to implement SV in a
step-by-step straightforward fashion.
this is a *massive* amount of work and a huge upgrade to OpenPOWER
which turns it into a modern-day 3D GPU/VPU ISA in ways that VSX
cannot be. we are also adding features such as:
* Swizzle (10 to 30% of GPU Shader applications use vec2/3/4 Swizzle)
* transcendental opcodes planned (also absolutely critical for 3D GPUs)
* Galois Field and bit-manipulation opcodes suitable for cryptographic
applications, Audio/Video encode and decode.
* REMAP capability (aka ARM NEON "structure packing") suitable also
for Matrix Multiply with a single fma or madd instruction to perform up
to a full 4x4 matrix multiply or Rijndael MixColums with one opcode.
and a stack more. the "top-level" page which ties it all together is here:
https://libre-soc.org/openpower/sv/
the work is critically dependent on review, and approval by the OPF
ISA WG. assistance in review, general questions, always appreciated.
l.
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the Libre-soc-dev
mailing list