[libre-riscv-dev] [Bug 216] LOAD STORE buffer needed
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Sun Apr 19 18:46:26 BST 2020
https://bugs.libre-soc.org/show_bug.cgi?id=216
--- Comment #23 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
https://libre-soc.org/3d_gpu/mem_l0_to_l1_bridge_v2.png
ok done. it works like this:
* each FU (0-7) produces 2 LD/ST addresses which are broken up like this:
addr1[0..3] addr1[4] addr1[5..11] addr1[12..48]
addr2[0..3] addr2[4] addr2[5..11] addr2[12..48]
where the relationship between addr1 and addr2 is:
addr1[5..11] + 1 == addr2[5..11]
* addr1[0..3], in combination with the LD/ST len (1/2/3/4) is turned into
a bytemap mask, 24 bits in length. this bytemask is broken down into
two halves:
bytemask1, bytemask2 = map_addr(addr, LDST_len) [0..15], [16..23]
i.e. anything that does not fit fully into bytemask1 is a "misaligned"
LD/ST and the remainder overflows into bytemask2.
* if addr[4..11] == 0b111111111 and bytemask2 is non-zero, this indicates
a "misaligned major page fault".
this is a situation that we are *not* going to deal with (and it has
been catered for in the 3.0B spec)
* all 16 FU LD/ST re-encodings of the (addr, LDST_len) are lined up in a
table. this table breaks down, alternating between:
* FU # aligned or FU # misaligned
* addr1[5:11] or addr2[5:11]
* addr[12:48] for *BOTH*
* bytemap1[0:15] or bytemap2[0:15]
* data1[0:15] or data2[15]
note that addr[4] is *not* included in this because it is used to select
whether L1 cache bank #0 or #1 is to be used.
the algorithm for merging of LD/STs into *one single L1 cache line* is:
1). With a PriorityPicker find the index (row) of the first valid LD/ST request
2). For all entries after that row, compare Addr[5:11] and Addr[12:48].
3). If "match" on both, OR the byte-mask for that row onto the output.
that's it. that's really all there is to it.
one thing that's important to note: there are only actually *eight* comparisons
of addr[12:48] needed (not 16), because the addr[12:48] is *identical* for
every *pair* of rows.
that however is still *seven* potential 36-bit CAM hits (seven lots of 36-bit
XOR gates). which is a hell of a lot.
if we could somehow use the L1 "tag" in place of Addr[12:48], that would save a
huge amount of power. unfortunately, every way i can think of that would get
the tag *into* L0 is either equally power-consuming, or results in multi-cycle
latency.
if we could reliably use a hash instead, i would suggest it. however,
unfortunately, the risk of a collision is too detrimental consequences.
the "sensible" option that does not have too detrimental an effect on
performance is: reduce the number of LD/ST FUs to 6. that would result in only
12 rows.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-riscv-dev
mailing list