[libre-riscv-dev] building a simple barrel processor
programmerjake at gmail.com
Fri Mar 8 09:03:07 GMT 2019
On Fri, Mar 8, 2019, 00:32 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> On Fri, Mar 8, 2019 at 8:17 AM Jacob Lifshay <programmerjake at gmail.com>
> > On Thu, Mar 7, 2019, 23:34 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > wrote:
> > > On Fri, Mar 8, 2019 at 6:11 AM Jacob Lifshay <programmerjake at gmail.com
> > > wrote:
> > >
> > > > Just to clarify, I'm asking if you think it's a good idea to work on
> > > > since it will take some time.
> > >
> > > at its heart a barrel processor is a single-core single-issue
> > > timeslicing design, suited to real-time I/O processing. if we were to
> > > add multiple barrel 4-time-sliced SMP cores, it would result in
> > > multiple proliferations of the massive dual/triple-ported 8k SRAMs.
> > >
> > actually, for barrel processors, because each instruction takes many
> > to execute, you only need a single-ported sram split into banks, since
> > hart has its own separate bank(s), each hart can read each instruction's
> > arguments one at a time, then write at the end. if there are enough harts
> > per core you can also run the sram at a lower clock rate & maybe lower
> > voltage -- dedicating more pipeline stages to reading and writing the
> > register file.
> as a parallel processor - a dedicated GPU - this would likely be
> really good. i remember older GPUs running at like only 300-400mhz.
> as a general-purpose processor however it would suck. the latency
> would be atrocious for anything that required single-process
> we'd therefore need to completely change the design strategy to a
> dual (split) CPU + GPU,
add forwarding and skip idle harts (defined as harts executing wfi), could
have the low registers have more ports or maybe a 4-8 reg-per-hart 3r1w
could alternatively have only first hart in each core have a fast mode,
linux can handle that thanks to ARM's bigLITTLE support in the scheduler
(as of 5.0).
> and have the kazan codebase modified to
include an IPC/RPC mechanism that was capable of packaging up all API
> calls, shipping them over to the GPU and having it execute things
We don't need IPC/RPC. we can still share all the memory and be inside the
same process and use all the standard inter-thread synchronization
mechanisms. sharing memory like that happens on most mobile gpus anyway.
I'm implementing inter-thread communication anyway since we want the gpu
work to not be stuck on a single core.
More information about the libre-riscv-dev