[libre-riscv-dev] building a simple barrel processor

Fri Mar 8 09:03:07 GMT 2019

On Fri, Mar 8, 2019, 00:32 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Fri, Mar 8, 2019 at 8:17 AM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> > On Thu, Mar 7, 2019, 23:34 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > wrote:
> >
> > > On Fri, Mar 8, 2019 at 6:11 AM Jacob Lifshay <programmerjake at gmail.com
> >
> > > wrote:
> > >
> > > > Just to clarify, I'm asking if you think it's a good idea to work on
> this
> > > > since it will take some time.
> > >
> > >  at its heart a barrel processor is a single-core single-issue
> > > timeslicing design, suited to real-time I/O processing.  if we were to
> > > add multiple barrel 4-time-sliced SMP cores, it would result in
> > > multiple proliferations of the massive dual/triple-ported 8k SRAMs.
> > >
> > actually, for barrel processors, because each instruction takes many
> cycles
> > to execute, you only need a single-ported sram split into banks, since
> each
> > hart has its own separate bank(s), each hart can read each instruction's
> > arguments one at a time, then write at the end. if there are enough harts
> > per core you can also run the sram at a lower clock rate & maybe lower
> > voltage -- dedicating more pipeline stages to reading and writing the
> > register file.
>
>  as a parallel processor - a dedicated GPU - this would likely be
> really good.  i remember older GPUs running at like only 300-400mhz.
>
>  as a general-purpose processor however it would suck.  the latency
> would be atrocious for anything that required single-process
> performance.
>
>  we'd therefore need to completely change the design strategy to a
> dual (split) CPU + GPU,

add forwarding and skip idle harts (defined as harts executing wfi), could
have the low registers have more ports or maybe a 4-8 reg-per-hart 3r1w
cache.

could alternatively have only first hart in each core have a fast mode,
linux can handle that thanks to ARM's bigLITTLE support in the scheduler
(as of 5.0).

> and have the kazan codebase modified to

include an IPC/RPC mechanism that was capable of packaging up all API
> calls, shipping them over to the GPU and having it execute things
> there.
>
We don't need IPC/RPC. we can still share all the memory and be inside the
same process and use all the standard inter-thread synchronization
mechanisms. sharing memory like that happens on most mobile gpus anyway.
I'm implementing inter-thread communication anyway since we want the gpu
work to not be stuck on a single core.