[libre-riscv-dev] 1R1W regfiles
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Wed Dec 19 18:41:03 GMT 2018
On Wed, Dec 19, 2018 at 11:17 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> I think you will need more crossbars than that, though I'll have to check.
> Assuming the scalar performance doesn't matter and we don't need more crossbars, I like it.
> We could improve the scalar performance by switching to 2r1w and vector could then read 256-bits at a time rather than 512-bits.
> 2r1w will also allow us to use fpga's block ram by splitting it into 2 reg files that hold the same data and using 1r1w block ram, allowing the fpga design to be much smaller.
ok that's quite important.
> This should probably be on the mailing list.
i'll do a more detailed response tomorrow, summary:
* "3R1W 4 banks" allows 4 32-bit FMACs/clock cycle to be done...
however we only actually need 2 (if quad-core and if 800mhz)
* "2R1W 4 banks" will be 2 32-bit FMACs/clock and will only need QTY 2
16x16 8-bit-wide crossbars
* "1R1W" will require *EIGHT* banks in order to generate 2 32-bit
FMACs/clock, which means a MASSIVE 32x32 8-bit-wide crossbar.
> On Wed, Dec 19, 2018, 02:36 Luke Kenneth Casson Leighton <lkcl at lkcl.net wrote:
>> mitch alsup is recommending an idea of using 1R1W SRAM instead of
>> 3R1W. src operands are read sequentially, however they're read in
>> much larger batches (4x the width), so 4 at a time where 1 would
>> normally be read.
>> the multiplexing that we need, for 2x 64-bit (or 4x 32-bit hi/lo) is a
>> whopping 16-to-16 byte-level crossbar.
>> and for 3R1W we need *three* of them.
>> that's completely insane :)
>> if however we only had 1R1W, only one 16-to-16 byte-level crossbar
>> would be needed.
More information about the libre-riscv-dev