[libre-riscv-dev] building a simple barrel processor

Sat Mar 30 10:01:19 GMT 2019

no, i remember now: it was several months ago, took me a while to
recall the analysis from memory.

the analysis of using 1R1W was much worse than having to do 4-lanes.
4-lanes is what you have to do just to keep a single-issue non-SIMD
non-vector 8-stage FMAC pipeline occupied, because if you don't have 4
lanes then only one stage of the pipeline at a time can do reads.

yes, one of the read slots is free (with a 4-lane 1R1W layout): mitch
alsup advocated using this for a LOAD slot [needed for a register
address calculation]

[yes, hypothetically it could be a 3-lane design, and there would be
no room for interleaving any other operations without stalling the
FMAC pipeline].

however we know that the performance target is 5-6 GFLOPs, therefore
we need 4 cores running at 800mhz, and for each core to do *4* FMACs
at once.

therefore, the only way for each core to achieve that with a 1R1W
register layout is to have a whopping *SIXTEEN* register bank lanes!
(ok, hypothetically 12).

that would mean that general-purpose register access would be modulo
*16* bottlenecked.

*that's* why it's not a good idea [for a general-purpose CPU].  it
would be a fine strategy for an ultra-high-performance Vector
Processor or a high-end GPU-only Processor, because there, the massive
parallelism would allow for a 16-lane memory hierarchy.

l.