[Libre-soc-dev] Libre-SOC SVP64 first Cray-style Vector loop successful

Sun Mar 7 13:24:51 GMT 2021

with many thanks and congratulations to Cesar for getting the first basic
Cray-style Vector Loop operational in HDL, on top of OpenPOWER v3.0B and
passing its first unit tests.

https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=97136d71397f420479d601dcb80f0df4abf73d22

the Simulator ISACaller was functional three weeks ago and has been used to
verify the HDL's functionality through co-simulation:

https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=9078b2935beb4ba89dcd2af91bb5e3a0bcffbe71

the unit tests for the ADD pipeline will build up progressively over time
to give operational confidence (extended later with Formal Correctness
Proofs).  other pipelines will have their own extensive tests as well.

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/test/svp64_cases.py;hb=HEAD

the unit tests which are deliberately well-commented serve as examples to
demonstrate the SVP64 Vectorisation principle.  however the hardware unit
tests themselves rely on the python-based Simulator, ISACaller, which is
co-simulated (above) and so iw not immediately obvious what is occurring.
for more explicit unit tests, these are more appropriate:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/test_caller_svp64.py;hb=HEAD

above is "sv.add." for example which enables Vectorised Rc=1, producing a
*Vector* of CR fields that are explicitly verified in the test.

this allows, (unlike VSX which only creates CR6) inherently far greater
parallelism (64 computations per clock cycle per issue) without Condition
Register Hazards occurring.

with the ISACaller Simulator having rigorous explicit checking of output,
this engenders confidence in co-simulation to simply extract and compare
the entire contents of all regfiles from both ISACaller and HDL, after each
step.

an identical co-simulation process is planned to be applied for qemu,
power-gem5 and Microwatt when adding SVP64.  qemu co-simulation of
OpenPOWER v3.0B is *already* performed (and has found obscure bugs in qemu!)

our next steps are to add:

* LD/ST and other pipeline unit tests
* Single and Twin Predication
* Polymorphic Element Width overrides (uint8/16/32, FP16, BF16)
* Saturation, Mapreduce, and other advanced modes

medium term:

* Scalar v3.0B Bitmanipulation extensions suitable for cryptographic, Audio
and Video and other uses (Vectorisation of these now comes "for free"!)
* Scalar v3.0B IEEE754 FP Transcendentals (SIN, COS, ATAN2, LOG1P) which
again become inherently and automatically Vectorised
* 3D Texturisation opcodes suitable for Vulkan Khronos Group Compliance

longer term:

* SV-REMAP provides 2D and 3D Matrix register remapping
* Instruction Prefix remapping similar to hardware compression

A particularly fascinating and powerful combination of the above involves
the planned general-purpose Galois Field multiply operation in the
bitmanipulation extension, with REMAP Matrix applied, from which Rijndael
(AES) "MixColumns" functionality emerges *as a single instruction* without
requiring that to be added as an explicit opcode as would normally be done
in any other ISA.

all of this is achieved without modifying the Scalar OpenPOWER v3.0B ISA in
any way, without requiring (eliminating and superceding) any of the SIMD
(VSX) ISA, merely adding a new "Vector Context" inside v3.1 style EXT01 64
bit prefixes.

we are tracking the implementation progress here, including gcc, power-gem5
and would love to see Microwatt as well, as time permits:

https://libre-soc.org/openpower/sv/implementation

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68