[Libre-soc-dev] [OpenPOWER-HDL-Cores] load/store conditional

Tue May 25 19:10:48 BST 2021

On Mon, May 24, 2021, 18:44 Benjamin Herrenschmidt <benh at kernel.crashing.org>
wrote:

> On Mon, 2021-05-24 at 16:30 +0100, Luke Kenneth Casson Leighton wrote:
> > refs:
> >
> > * https://en.wikipedia.org/wiki/Load-link/store-conditional
> > * v3.0B section 4.6.2 p868
> > *
> > https://github.com/riscv/riscv-isa-manual/blob/master/src/a.tex#L320
> >
> > paul, hi,
> >
> > the discussion on wednesday covered a lot of ground, i didn't manage
> > to successfully communicate my point about LR AX.  i thought it best
> > to follow up because after reviewing lwarx etc the specification
> > ambiguity i expected might be there looks like it is.
> >
> > what appears to be missing is how many instructions are permitted
> > between a LR and an SC. without this information it imposes a
> > significantly higher hardware implementation cost and complexity than
> > might at first appear.
>
> There is no limit.
>

I think the issue is not how many instructions can be put between LR and
SC, since both OpenPower and RISC-V have no limit, but instead: how many
and which kind of instructions can be put in between a LR and SC while
retaining an architectural forward-progress guarantee. Just snooping and
deleting the reservation is not sufficient if you want a forward-progress
guarantee, since it is totally possible for the following loop to live-lock:
loop:
lwarx r5,0,r3
stwcx. r4,0,r3
bne loop

The following assumes lwarx is implemented by obtaining the cache block in
the Exclusive or Modified states in the MESI cache coherency protocol.
Assume two cpu threads T1 and T2 are both executing the loop:
T1: lwarx
cache block is moved to T1 in the Exclusive state.
T2: lwarx
cache block is moved to T2 in the Exclusive state, breaking T1's
reservation.
T1: stwcx. -- fails
T1: bne loop -- branches
T1: lwarx
cache block is moved to T1 in the Exclusive state, breaking T2's
reservation.
T2: stwcx. -- fails
T2: bne loop -- branches
T2: lwarx
cache block is moved to T2 in the Exclusive state, breaking T1's
reservation.
T1: stwcx. -- fails
T1: bne loop -- branches
T1: lwarx
cache block is moved to T1 in the Exclusive state, breaking T2's
reservation.
T2: stwcx. -- fails
T2: bne loop -- branches
T2: lwarx
cache block is moved to T2 in the Exclusive state, breaking T1's
reservation.
T1: stwcx. -- fails
T1: bne loop -- branches
T1: lwarx
cache block is moved to T1 in the Exclusive state, breaking T2's
reservation.
T2: stwcx. -- fails
T2: bne loop -- branches
T2: lwarx
cache block is moved to T2 in the Exclusive state, breaking T1's
reservation.
...
live-lock

RISC-V has an architectural forward-progress guarantee for the equivalent
loop, suggesting that a cpu implementation prevents live-lock by blocking
other cpus from taking the cache block associated with a reservation for a
few cycles (enough for at least 16 simple integer instructions), giving the
cpu enough time to get to the store-conditional and successfully store.

Jacob