[libre-riscv-dev] [Bug 216] LOAD STORE buffer needed

Sun Apr 19 16:22:14 BST 2020

https://bugs.libre-soc.org/show_bug.cgi?id=216

--- Comment #22 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
https://libre-soc.org/3d_gpu/architecture/6600scoreboard/

i just updated this page, a new section "L0 Cache/Buffer".  it contains a
diagram i did a couple weeks ago:

https://libre-soc.org/3d_gpu/architecture/6600scoreboard/600x-mem_l0_to_l1_bridge.png

the actual algorithm is incredibly simple:

* priority picker picks one (and only one) address (per L0 cache/buffer)
* for all rows greater than the one picked, match against all MSBs of
  the address, bits 5 and above.
* all matching rows, OR the 16-bit bitmap representing {LSBs 0-3 plus
LD/ST-len} 

those ORed bitmaps become the "byte read/write-enable" lines on a given
L1 cache line.

that's it - that's all there is to it.

the "complex" bit is the N-in/out multiplexing from 16-in on the 8 LD/ST
FunctionUnits (2 ports per FU because 1 is for "normal" requests and the
2nd is for misaligned addresses)

however i just realised that if we can accept an increase in size of the L0
cache/buffer from 8 to 16 entries - or to limit the number of LD/ST Function
Units to 6 - then we can instead simply have one *dedicated* "entry" for
each and every FU, and the entire MASSIVE 16-to-4 multiplexer completely
disappears.

i'll draw it out.

the caveat: we now have a 16-entry (or 12-entry) L0 Cache/Buffer, which
unfortunately requires up to a 16-way CAM on the *ENTIRE* address from
bits 5 and upwards.

it *might* be possible to inherit (yet again) from the addr_match.py classes,
which already has a full triangular comparison of all-against-all in 
bits 4 thru 11.

-- 
You are receiving this mail because:
You are on the CC list for the bug.