[Libre-soc-dev] 3D MESA Driver

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sun Aug 9 14:08:26 BST 2020

On Sunday, August 9, 2020, vivek pandya <vivekvpandya at gmail.com> wrote:

> I am not able to see much perf benefit except following things with this
> idea
> 1) PC increment will be skipped by doing loops in HW.

i missed this, first time.

one key aspect is that the compression ratio is much higher than a normal
GPU ISA.  64 or 128 bit instructions are considered "normal" and also
"acceptable" given that one 128 bit VLIW instruction can issue 4x vec4
operations per cycle.

we instead can and will have 16 bit "compressed" scalar instructions (like
RVC) that can be "prefixed" with a 16 bit SimpleV header making them
vectorised, and consequently can have 32 bit vector instructions where most
GPU ISAs are at least 64.

with most shader programs being well under 8k a dedicated L1 GPU I-Cache
copes perfectly adequately, in a dedicated GPU.

we are doing a hybrid ISA: there is no separate GPU I-Cache: the CPU *is*
the GPU therefore the CPU I-Cache *is* the GPU I-Cache.

consequently any unnecessarily long vectorised instructions have a
detrimental *system-wide* effect on performance when context-switches occur
between CPU and GPU workloads.

the GPU shader program effectively being "just another userspace binary" at
this point, however it is a userspace binary that is almost exclusively
stocked with Vector opcodes.

the power efficiency savings of this design therefore come from two aspects:

1) we no longer have the IPC and serialisation / marshalling and context
switching associated with dual ISA designs (one CPU ISA, one GPU ISA).
this overhead being gone, less work is done and thus less power required.

2) GPU binary sizes are smaller, reducing I-cache usage, meaning they may
be smaller, and consequently require less power.


crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

More information about the Libre-soc-dev mailing list