[Libre-soc-dev] [OpenPOWER-HDL-Cores] microwatt dcache potential bug (overlap r0 and r1)
Paul Mackerras
paulus at ozlabs.org
Fri Jan 14 21:46:41 GMT 2022
On Fri, Jan 14, 2022 at 03:39:28PM +0000, Luke Kenneth Casson Leighton wrote:
> ok got it.
>
> * a "std" instruction is executed after enabling the MMU by linux
> (MSR=0xb000000000000033) when PRTBL = 0xe00000c
> and PIDR=1
> * the "std" fails with a STORE_MISS and triggers an MMU RADIX walk
> * all goes well: the RADIX lookup succeeds by firing off a sequence
> of 8 WB read operations. the sequence is done 3 4 5 6 7 0 1 2
> (which can be seen in r1.forward_row1)
> and the 8 STBs and 8 acks are all fine.
> * in the middle of one of those (ack 4) the address matches with
> what the MMU requested, mmu_req is acknowledged, and MMU
> then asks dcache to engage a PTE.
> * HOWEVER... due to the early response allowing setting "IDLE",
> whilst that PTE entry is engaged, dcache is still processing those
> remaining ACKs (5 6 7 0 1 and 2)
> * r1.full was set to ZERO when "IDLE" was set
> * dcache now having a valid PTE, LoadStore re-tries the "std" instruction
> * dcache finds that both r0.full and r1.full are zero
> [but remember, there are still ACKs being processed from the MMU
> RADIX walk, where ack4 was the one that produced the valid leaf-node
> for the PTE entry!]
> * dcache is currently *still processing r1.state=RELOAD_WAIT_ACK*,
> r1.full is *ZERO*, and yet it gets asked to process the incoming
> "std" operation.
In this situation, the incoming store request should get put into
r1.req and then processed once the state machine gets back to IDLE
state. In the VHDL, the variable 'req' represents a multiplexer that
selects between r1.req (the stored request) and a request constructed
from req_op, ra, r0.req, etc. This enables the state machine to
handle either a request that has just arrived, or one that came in
previously while the state machine was busy. It's like a 1-entry
FIFO (but with 0 latency in the empty case).
> now, whether this is a bug in the porting of dcache.vhdl to dcache.py
> or whether it's a bug in the original i honestly couldn't say.
I haven't looked at your dcache.py translation. Is it accessible
somewhere? (not that I'm much good at python...)
> if this _is_ a bug in the original microwatt vhdl and it has not been
> encountered, then it is possibly being "avoided" by the time taken from
> the PTE entry that the "std" needs to the time that the "std" is
> re-tried is sufficiently long (or the time taken to complete the
> remaining ACKs so short) that it is not triggered.
The VHDL certainly is designed to be able to handle a store coming in
while r1.full = 0 and the state machine is in RELOAD_WAIT_ACK state.
> certainly in the traces i'm looking at, the entry with the leaf
> valid goes back from dcache to the MMU, that takes 1 cycle,
> the MMU requests a PTE entry to be added, that takes 2
> cycles, the "stb" operation is retried by about 4 cycles whilst
> i am definitely seeing r1.forward_row1 still advancing and
> still finishing up the load of the cache row which happened
> to have that valid leaf entry right close to the beginning of
> the row.
>
> ultimately though it is the ability to respond early (setting
> r1.full to zero) whilst still allowing r1.state to be non-IDLE
> that is the problem [in dcache.py]
Paul.
More information about the Libre-soc-dev
mailing list