[Libre-soc-isa] Draft SVP64 adding XLEN to pseudocode spec (defaults to 64)

Fri Sep 10 15:03:42 BST 2021

On August 31, 2021 7:46:18 PM UTC, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>with thanks to 3mdeb, we've begun the process of altering the
>pseudocode
>used by Libre-SOC to be able to do element-width over-rides.  example:
>https://libre-soc.org/openpower/isa/fixedarith/
>
>bearing in mind this is actual fully-functional executable pseudocode

this is a follow-up report on the progress so far.  Dimitry from 3mdeb, with help from Jacob and myself, have completed 95% of the Scalar Fixed-point XLEN conversion.  all existing unit tests when XLEN=64 (several thousand) pass 100% indicating that no functional change has occurred.

to make it absolutely clear: *no behavioural change is a hard inviolate requirement*

the changes made - which are in absolutely no way behavioural changes when XLEN=64 - include a clarity rewrite of addg6s (making it better suited to adaptation) and in some cases the inclusion of clarifying constants (e=XLEN-1) particularly on pseudocode lines which repeatedly referenced subsections of registers:

    RS[32:63] <- RA[32:63] + .....

without such constants translated to:

    RS[XLEN/2:XLEN-1] <- RA[XLEN/2:XLEN-1]

which starts to interfere with clarity.  some feedback here appreciated.

additional changes include replacing hard-coded constants 0xffff_ffff_ffff_ffff with [1] * XLEN and so on. in some cases this improves readability.

other more sophisticated changes include byte-level for-loops 0 to 7 with 0 to XLEN/8.  these had to have some discussion. 

in general we have kept the changes to the minimum so that they are easy to review (and welcome and would appreciate constructive open and transparent feedback long before the RFC is submitted)

one thing: there seems to be some misunderstandings which need to be cleared up.  it would help enormously if people engaged directly with us to provide feedback.  this will make the RFC process much smoother when it takes place, which *will* take place, and will take place through the external non-members ISA WG RFC Process, being established through the kind and dedicated efforts of the newly-formed OPF ISA WG.

the XLEN specification changes that we are committed to working on are *NOT* restricted in scope and value to SVP64.

there are two additional benefits to OpenPOWER Foundation Members and for the Power ISA that have *nothing to do with SVP64*.

1) XLEN=32 on Scalar Power ISA.

at present, a bare minimum 64 bit Scalar Fixed Point Compliancy Subset Softcore such as Microwatt or Libre-SOC is a whopping 20,000 LUT4s.  in Commercial Embedded Specialist Industrial applications, where power consumption, executable size and resource utilisation are all absolutely critical, such *unavoidable and Specification-mandated* massive resource utilisation unfortunately relegates the Power ISA to the status of a third class citizen.

this may easily be fixed through the simple matter of allowing Implementors to implement 32-bit-only hardware, allowing at least a 4-fold reduction in resource utilisation for Softcores, and significant reduction in gate count on ASICs.

as the spec is currently worded, 32-bit register files and 32 bit ALUs are *strictly prohibited*, despite clear analysis showing that the entire upper half of all Fixed Point operations remains unused when MSR.64b=0

the stark difference is already well-known in the LibreBMC WG which is operating under severe resource constraints (85k and 100k LUT4 FPGAs which given the large size of required peripherals are barely enough): the fact that RISK5 can do a fully-functioning RV64 core in around 4,000 LUT4s (5 times less than Power ISA), and even less for RV32, will not have gone unnoticed.

another Industry example: Western Digital run 32-bit-only cores on all SSD, USB, and HDD products.

why?

* if the firmware runs with far greater memory footprint due to 64 bit pointers consuming much greater stack and heap space, performance is compromised (and the product more expensive)

* if the firmware is *larger* due to constants being 64 bit then due to the product *using its own storage* for firmware, it is less attractive to competitor products by stating "Capacity: 490 GB" instead of "Capacity: 500 GB".

* if programs are larger, then caches have to be larger, and power consumption is higher.  this again compromises their products when compared to competitor equivalents.

power consumption jumped a *MASSIVE 10%* from the 32-bit ARM Cortex A7 to the 64-bit ARM Cortex A53 *purely due to 64 bit being mandatory*.  default recommended caches had to jump *50%* to keep performance on-par with the "out-of-date" 32 bit ARM design.

bottom line here is that whilst 64 bit for Supercomputers and HPC is standard, in the Embedded Industrial world, 64 bit punishes mass volume products so categorically and resolutely that they simply cannot be taken seriously.

2) XLEN=128

128-bit Scalar is already part of RISC5.  RV128 is a predefined standard, and it makes sense to consider enhancing the Power ISA for the exact same strategic reasons.

both these tasks are made much easier if the specification's pseudocode has been updated to be length-independent.

XLEN=64 unit tests over the next few weeks and months we will be developing XLEN=8, XLEN=16, and XLEN=32 unit tests (at least tripling the size of the already-extensive Libre-SOC Power ISA Verification Test Suite).

[as an aside, we have begun the process of defining a simple long-planned API to make it possible to co-run and compare, side-by-side, arbitrary Power ISA implementations.  we welcome assistance and contributions to add other Power ISA implementations, whether they be FPGA softcores, ASICs, simulators, emulators, or production products: co-simulation helps everybody]

when XLEN=32 SVP64 unit tests have been completed, these can accompany an XLEN=32 Scalar Power ISA RFC because they will be testing the same thing.

l.