[Libre-soc-dev] svp64 review and "FlexiVec" alternative

Wed Jul 27 02:36:59 BST 2022

Jacob Lifshay wrote:
> On Tue, Jul 26, 2022, 17:11 Jacob Bachmeyer via Libre-soc-dev 
> <libre-soc-dev at lists.libre-soc.org 
> <mailto:libre-soc-dev at lists.libre-soc.org>> wrote:
> [...]
>
>       OpenSPARC
>
>
> that's not a particularly high register count...you can still only 
> access 64 registers at any one time, 32 int, 32 fp. All of the 
> additional registers used for the call stack are usually inaccessible. 
> SimpleV currently has >4x as many registers accessible (128 int, 128 
> fp, 128 cr fields) and that will likely increase in the future.

OK, then, so there are probably *no* existing ISAs with enough registers 
for a Simple-V GPU and expanding the register file like that is 
certainly a deep and foundational change, since it will affect every 
instruction that can access the new registers.  (I believe that you are 
proposing REX prefixes in Power ISA to support the larger register file?)

>     or another high-register-count
>     ISA might be useful, or possibly a dedicated Libre-SOC GPU
>     architecture,
>     with an OpenPOWER (sans vector facilities) control unit in the
>     actual SOC.
>
>
> i think we should specifically have the same ISA for cpu and gpu 
> stuff, it makes possible optimizing 3D graphics much more if it 
> becomes wide-spread, as opposed to current GPUs where the vendors 
> basically forbid you from using their native ISA and insist you must 
> use their compiler to process all your gpu code first.

That is because the GPU ISA is currently an implementation detail (that 
can change from model to model) and the compiler provides the stable 
interface.

> Using the same ISA also reduces communication overhead because you can 
> just treat it as a normal multithreaded program, rather than this 
> thing that you have to go to great effort to queue up work for and use 
> special kernel drivers, etc.

No, the great effort to queue up work and special kernel drivers are due 
to how the GPU is attached to the system, being a peripheral instead of 
a full processor.

Using a different ISA does not necessarily require these overheads, if 
the CPU/GPU combination is implemented as a heterogenous multiprocessor 
instead of the current GPU-as-peripheral model.  The current model is 
held in place mostly by the existing driver codebases and the utter 
failure to standardize inter-CPU interconnects.  (AMD proposed 
HyperTransport and got a few other vendors to concur (I seem to remember 
IBM also adopted it) but Intel said "Not Invented Here" to that and made 
their own (incompatible) knockoff, so we do not get HyperTransport PC 
GPUs.)  The latter is not a problem for a SOC, where the GPU is 
integrated into the main module.

-- Jacob