[Libre-soc-dev] svp64 review and "FlexiVec" alternative

Wed Jul 27 04:02:12 BST 2022

briefly (v. late here) GPUs went "SIMT" which is so badly misunderstood as to what it is, it's unreal.  SIMT is nothing more than taking a standard processor, replicating it, ripping out every single fetch-and-issue but one, and BROADCASTING the instruction to all others. in all other respects other than fetch/issue the cores are absolutely 100% standard cores including having full independent regfiles (the only standard reg missing of course is a PC, that's all).

the reason for doing this is because only having the one fetch/issue saves on L1 Instruction cache and allows overall increased percentage of die area to be dedicated to ALUs (30% to FP alone which is astoundingly high)

this is one of the primary reasons why running standard code on a GPU is a waste of time. that, and no MMU and an anaemic ISA too heavily focussed on specialist compute.

we on the other hand intend to have *uniform ISA* but different backend capacity.  wide.FAT instead of big.LITTLE.

wide for multi-issue OoO at high clock rate, FAT for massive SIMD in-order at lower clock rate., both still perfectly capable of running the full Power ISA and fully RADIX MMU and full SMP, even if needing "assist" (see rationale, below).

also the FAT cores will each need 8-10x the LD/ST bandwidth of standard CPUs

for more on this see https://libre-soc.org/openpower/sv/SimpleV_rationale/ which is the next phase of Simple-V.

l.

On July 27, 2022 2:40:21 AM GMT+01:00, Jacob Bachmeyer <jcb62281 at gmail.com> wrote:
>Jacob Lifshay wrote:
>> On Tue, Jul 26, 2022, 17:33 Jacob Lifshay <programmerjake at gmail.com 
>> <mailto:programmerjake at gmail.com>> wrote:
>>
>>     i think we should specifically have the same ISA for cpu and gpu
>>     stuff, it makes possible optimizing 3D graphics much more if it
>>     becomes wide-spread, as opposed to current GPUs where the vendors
>>     basically forbid you from using their native ISA and insist you
>>     must use their compiler to process all your gpu code first. Using
>>     the same ISA also reduces communication overhead because you can
>>     just treat it as a normal multithreaded program, rather than this
>>     thing that you have to go to great effort to queue up work for
>and
>>     use special kernel drivers, etc.
>>
>>
>> also, who wouldn't want to use their gpu to run normal cpu tasks too,
>
>> if it was available? compiling llvm on your 32-core gpu and 16-core
>cpu!
>
>The programming models are fundamentally different, such that the GPU
>is 
>not going to have the expected performance in other workloads.  To use 
>your example, I would expect most GPUs to fare poorly running a 
>compiler, such that I would be unsurprised if *one* of those 16 CPU 
>cores outperforms the entire GPU on that task.
>
>
>-- Jacob