[Libre-soc-dev] [OpenPOWER-HDL-Cores] microwatt dcache potential bug (overlap r0 and r1)

Paul Mackerras paulus at ozlabs.org
Fri Jan 14 21:46:41 GMT 2022

On Fri, Jan 14, 2022 at 03:39:28PM +0000, Luke Kenneth Casson Leighton wrote:
> ok got it.
> * a "std" instruction is executed after enabling the MMU by linux
>   (MSR=0xb000000000000033) when PRTBL = 0xe00000c
>   and PIDR=1
> * the "std" fails with a STORE_MISS and triggers an MMU RADIX walk
> * all goes well: the RADIX lookup succeeds by firing off a sequence
>    of 8 WB read operations.  the sequence is done 3 4 5 6 7 0 1 2
>    (which can be seen in r1.forward_row1)
>    and the 8 STBs and 8 acks are all fine.
> * in the middle of one of those (ack 4) the address matches with
>   what the MMU requested, mmu_req is acknowledged, and MMU
>   then asks dcache to engage a PTE.
> * HOWEVER... due to the early response allowing setting "IDLE",
>   whilst that PTE entry is engaged, dcache is still processing those
>   remaining ACKs (5 6 7 0 1 and 2)
> * r1.full was set to ZERO when "IDLE" was set
> * dcache now having a valid PTE, LoadStore re-tries the "std" instruction
> * dcache finds that both r0.full and r1.full are zero
>   [but remember, there are still ACKs being processed from the MMU
>    RADIX walk, where ack4 was the one that produced the valid leaf-node
>    for the PTE entry!]
> * dcache is currently *still processing r1.state=RELOAD_WAIT_ACK*,
>   r1.full is *ZERO*, and yet it gets asked to process the incoming
>   "std" operation.

In this situation, the incoming store request should get put into
r1.req and then processed once the state machine gets back to IDLE
state.  In the VHDL, the variable 'req' represents a multiplexer that
selects between r1.req (the stored request) and a request constructed
from req_op, ra, r0.req, etc.  This enables the state machine to
handle either a request that has just arrived, or one that came in
previously while the state machine was busy.  It's like a 1-entry
FIFO (but with 0 latency in the empty case).

> now, whether this is a bug in the porting of dcache.vhdl to dcache.py
> or whether it's a bug in the original i honestly couldn't say.

I haven't looked at your dcache.py translation.  Is it accessible
somewhere? (not that I'm much good at python...)

> if this _is_ a bug in the original microwatt vhdl and it has not been
> encountered, then it is possibly being "avoided" by the time taken from
> the PTE entry that the "std" needs to the time that the "std" is
> re-tried is sufficiently long (or the time taken to complete the
> remaining ACKs so short) that it is not triggered.

The VHDL certainly is designed to be able to handle a store coming in
while r1.full = 0 and the state machine is in RELOAD_WAIT_ACK state.

> certainly in the traces i'm looking at, the entry with the leaf
> valid goes back from dcache to the MMU, that takes 1 cycle,
> the MMU requests a PTE entry to be added, that takes 2
> cycles, the "stb" operation is retried by about 4 cycles whilst
> i am definitely seeing r1.forward_row1 still advancing and
> still finishing up the load of the cache row which happened
> to have that valid leaf entry right close to the beginning of
> the row.
> ultimately though it is the ability to respond early (setting
> r1.full to zero) whilst still allowing r1.state to be non-IDLE
> that is the problem [in dcache.py]


More information about the Libre-soc-dev mailing list