[libre-riscv-dev] GPU design
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sun Dec 9 04:35:16 GMT 2018
On Sun, Dec 9, 2018 at 3:38 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> What do you think of me building a proof of concept register allocator for
> my idea for sharing the int and fp register files? I'd estimate that it
> would take about 2-4 days of work.
sure... however please note below: modifying a hardware design and
expecting the compiler to "sort out the mess" is a sure-fire
guaranteed way to kill a project. i've learned of *another* project
that did this:
others include the Aspex Semiconductors Array-String Processor, which
had one of the worst productivity rates i've ever encountered: DAYS
per line of ASSEMBLY code. i worked for aspex as a FAE for six
> If it works well, we could reuse the architecture in the actual llvm
> backend when we get around to writing that.
that's exactly what concerns me. without the compiler work being
done *as well* we have absolutely no sure-fire guaranteed way to know
if the idea will be successful... or not.
consequently, it's extremely risky. and, more than that, there's
alternative (non-risky) options that are much more "standard".
also, we cannot just assume that llvm will be the only compiler. we
need gcc as well.
> I'm assuming that register allocation is what you think will be most of the
> compiler problems that you're nervous about.
it's a huge number of things:
* the shared workload, between VPU and GPU
* the needs of the GPU for using the integer file (now reduced in
size) for storing pixels
* that scalar RV hasn't done it
* that the compiler will need to generate "if else" blocks and/or
function calls on critical loop setup/teardown to dynamically cope
with runtime register allocation
* several other issues which i suspect will crop up as well
honestly, i feel it's one of those nightmare areas that will take
several months to _retrospectively_ have worked out that it wasn't a
and, with the exploration of the CDC 6600 with mitch alsup's help,
register renaming can be done by adding one- two- or three- entry
register queues at the front of each functional unit.
basically, the need for a merged register file is moot, it's hugely
problematic, it'll take a *long* time to work out *that* it was
please please think about that before committing the time. can you do
the hardware *and* the compiler (both gcc and llvm) in the same 2-4
days? is it possible to reduce the feedback loop latency in *any*
More information about the libre-riscv-dev