[libre-riscv-dev] [Bug 216] LOAD STORE buffer needed
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Sun Apr 19 16:22:14 BST 2020
https://bugs.libre-soc.org/show_bug.cgi?id=216
--- Comment #22 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
https://libre-soc.org/3d_gpu/architecture/6600scoreboard/
i just updated this page, a new section "L0 Cache/Buffer". it contains a
diagram i did a couple weeks ago:
https://libre-soc.org/3d_gpu/architecture/6600scoreboard/600x-mem_l0_to_l1_bridge.png
the actual algorithm is incredibly simple:
* priority picker picks one (and only one) address (per L0 cache/buffer)
* for all rows greater than the one picked, match against all MSBs of
the address, bits 5 and above.
* all matching rows, OR the 16-bit bitmap representing {LSBs 0-3 plus
LD/ST-len}
those ORed bitmaps become the "byte read/write-enable" lines on a given
L1 cache line.
that's it - that's all there is to it.
the "complex" bit is the N-in/out multiplexing from 16-in on the 8 LD/ST
FunctionUnits (2 ports per FU because 1 is for "normal" requests and the
2nd is for misaligned addresses)
however i just realised that if we can accept an increase in size of the L0
cache/buffer from 8 to 16 entries - or to limit the number of LD/ST Function
Units to 6 - then we can instead simply have one *dedicated* "entry" for
each and every FU, and the entire MASSIVE 16-to-4 multiplexer completely
disappears.
i'll draw it out.
the caveat: we now have a 16-entry (or 12-entry) L0 Cache/Buffer, which
unfortunately requires up to a 16-way CAM on the *ENTIRE* address from
bits 5 and upwards.
it *might* be possible to inherit (yet again) from the addr_match.py classes,
which already has a full triangular comparison of all-against-all in
bits 4 thru 11.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-riscv-dev
mailing list