[libre-riscv-dev] GPU design

Fri Dec 7 16:11:58 GMT 2018

On Fri, Dec 7, 2018 at 9:18 AM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:

> On Mon, Dec 3, 2018 at 11:02 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
> >
> > I created a simple diagram of what I think would work for the ALUs and
> > register file for the GPU design. The diagram doesn't include forwarding or
> > pipeline registers.
> >
> > https://salsa.debian.org/Kazan-team/kazan/blob/e4b516e29469e26146e717e0ef4b552efdac694b/docs/ALU%20lanes.svg
>
>  so, coming back to this diagram, i think if we stratify the
> Functional Units into lanes as well, we may get a multi-issue
> architecture.

i took a shot at explaining this also on comp.arch today, and that
allowed me to identify a problem with the proposed modulo-4 "lanes"
stratification.

when a result is created in one lane, it may need to be passed to the
next lane.  that means that each of the other lanes needs to keep a
watchful eye on when another lane updates the other regfiles (all 3 of
them).

when an incoming update occurs, there may be up to 3 register writes
(that need to be queued?) that need to be broadcast (written) into
reservation stations.

what i'm not sure of is: can data consistency be preserved, even if
there's a delay?  my big concern is that during the time where the
data is broadcast from one lane, the head of the ROB arrives at that
instruction (which is the "commit" condition), it gets committed,
then, unfortunately, the same ROB# gets *reused*.

now that i think about it, as long as the length of the queue is below
the size of the Reorder Buffer (preferably well below), and as long as
it's guaranteed to be emptied by the time the ROB cycles through the
whole buffer, it *should* be okay.

l.