[Libre-soc-dev] 3D MESA Driver

Mon Aug 10 16:56:28 BST 2020

On Mon, Aug 10, 2020 at 4:28 PM Hendrik Boom <hendrik at topoi.pooq.com> wrote:

> I suppose it will still be possible to divide the screen into tiles, and
> have a separate C(G)PU do the graphics in each tile -- for further
> parallellism.

exactly.  just like in Larrabee.  except with hardware-optimised
instructions [Larrabee was a software-exclusive experiment.  made a
great general-purpose "compute" engine, but nowhere near a good enough
- competitive - 3D GPU]

> This would likely be accomplished entirely in software.

exactly.

now, we *may* need a special area of memory for the tiles.  this would
be a special, small, protected resource, perhaps with Z-Buffer
capability.  we just have to see how it goes.  if the performance
isn't good enough, then (following Jeff Bush's analysis techniques) if
it turns out to be a high-reward target we provide a special tile
area, and associated instructions.

> Would this involve expensive data transfer between CPU's, which we are
> trying to avoid by merging the CPU with the GPU?

a lot of the software *complexity* in "normal" 3D GPU drivers is
because that "driver singular" is not just one binary executable or
library, it is a dog's dinner mess involving:

* userspace application
* proprietary userspace library which contains
* shader compiler and
* communications and marshalling/unmarshalling "shim" library to
kernelspace where
* kernelspace passes packed (shared memory) objects over to a separate
GPU using PCIe or other method and
* GPU unpacks the data and the shader binary and
* executes it whilst the CPU waits for the results and
* CPU receives notification in kernelspace of completion and
* context-switches back to the userspace application which
* continues on its path.

.... anyone think this is sane?  anyone?

normally, the tiling area would be part of the GPU: the CPU would
never, under any circumstances, get access to it - or even know it was
there.  those tiles would be copied directly out to the framebuffer by
a DMA engine (or straight memcpy) done on the *GPU*.

however with a hybrid CPU-GPU it's done using CPU instructions, or CPU
DMA, and CPU memory locking, and, crucially, it's done in *userspace*
- as part of a *userspace* application - not kernelspace.

l.