[Libre-soc-dev] [OpenPOWER-HDL-Cores] microwatt dcache potential bug (overlap r0 and r1)

Luke Kenneth Casson Leighton lkcl at lkcl.net
Fri Jan 14 15:39:28 GMT 2022

ok got it.

* a "std" instruction is executed after enabling the MMU by linux
  (MSR=0xb000000000000033) when PRTBL = 0xe00000c
  and PIDR=1
* the "std" fails with a STORE_MISS and triggers an MMU RADIX walk
* all goes well: the RADIX lookup succeeds by firing off a sequence
   of 8 WB read operations.  the sequence is done 3 4 5 6 7 0 1 2
   (which can be seen in r1.forward_row1)
   and the 8 STBs and 8 acks are all fine.
* in the middle of one of those (ack 4) the address matches with
  what the MMU requested, mmu_req is acknowledged, and MMU
  then asks dcache to engage a PTE.
* HOWEVER... due to the early response allowing setting "IDLE",
  whilst that PTE entry is engaged, dcache is still processing those
  remaining ACKs (5 6 7 0 1 and 2)
* r1.full was set to ZERO when "IDLE" was set
* dcache now having a valid PTE, LoadStore re-tries the "std" instruction
* dcache finds that both r0.full and r1.full are zero
  [but remember, there are still ACKs being processed from the MMU
   RADIX walk, where ack4 was the one that produced the valid leaf-node
   for the PTE entry!]
* dcache is currently *still processing r1.state=RELOAD_WAIT_ACK*,
  r1.full is *ZERO*, and yet it gets asked to process the incoming
  "std" operation.

now, whether this is a bug in the porting of dcache.vhdl to dcache.py
or whether it's a bug in the original i honestly couldn't say.

if this _is_ a bug in the original microwatt vhdl and it has not been
encountered, then it is possibly being "avoided" by the time taken from
the PTE entry that the "std" needs to the time that the "std" is
re-tried is sufficiently long (or the time taken to complete the
remaining ACKs so short) that it is not triggered.

certainly in the traces i'm looking at, the entry with the leaf
valid goes back from dcache to the MMU, that takes 1 cycle,
the MMU requests a PTE entry to be added, that takes 2
cycles, the "stb" operation is retried by about 4 cycles whilst
i am definitely seeing r1.forward_row1 still advancing and
still finishing up the load of the cache row which happened
to have that valid leaf entry right close to the beginning of
the row.

ultimately though it is the ability to respond early (setting
r1.full to zero) whilst still allowing r1.state to be non-IDLE
that is the problem [in dcache.py]


More information about the Libre-soc-dev mailing list