[Libre-soc-bugs] [Bug 236] Atomics Standard writeup needed

Tue Jul 26 05:44:10 BST 2022

https://bugs.libre-soc.org/show_bug.cgi?id=236

--- Comment #44 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #43)
> excellllent, ok.
> 
> so IBM decided to use "cache barriers"

you mean "memory barriers" (aka. memory fences).

> which needs to be determined if
> that is directly equivalent to lr/sc's aq/rl flags.

They are a similar kind of memory fence. PowerISA's memory fences are not 1:1
identical to RISC-V's aq/rl flags -- RISC-V's aq/rl flags are basically C++11's
memory orderings adapted to work on a cpu (by changing most non-atomic
load/stores to be memory_order_relaxed atomic load/store ops).
> 
> we also need to know if multiple atomic operations can
> be multi-issue in-flight (i seem to recall POWER9 is 8-way multi-issue?)
> 
> also we need to know what the granularity of internal single-locking
> is, by that i mean that if there are multiple requests to the same
> {insert thing} then it is 100% guaranteed that, like intel, only
> one will ever be serviced.

The spec defines that (the reservation granule) to be >= 16B and <= minimum
supported page size.

According to:
https://wiki.raptorcs.com/w/images/8/89/POWER9_um_OpenPOWER_v20GA_09APR2018_pub.pdf

POWER9's reservation granule is 128B.
> 
> i suspect, from reading the Power ISA Spec, that {thing} is a Cache
> Block.

It's not necessarily...but most reasonable implementations use a cache block.
> 
> however that needs to be explicitly determined by deliberately hammering
> a POWER9 core with requests at different addresses, varying the differences
> and seeing ifthe throughput drops to single-contention.
> 
> at exactly the same address is no good, we can assume that will definitely
> cause contention.
> 
> the other important fact to know is, how does the forward-progress guarantee
> work, i.e. how do these "cache barriers" work and i suspect they are similar
> to IBM's "Transactions".

I'd expect most of them work by just stopping further instructions from
executing and restarting the instruction fetch process...afaict that's what the
spec says has to happen for sync and lwsync -- lwsync would be cheaper by not
requiring as much store buffer flushing i guess.

Note that none of the restarting instruction fetch stuff is needed for any of
the C++11 memory fences...they only care about load/store/atomic execution
order.

> there is probably an internal counter/tag which goes
> up by one on each lwsync.
> 
> other architectures are not exacctly of no interest but please really there
> is
> only 2-3 days left before this bugreport gets closed so focus on POWER9

Why would we close it now...isn't there still around a month before we have to
get all RFPs in to nlnet? (giving them a month to meet their deadline)

-- 
You are receiving this mail because:
You are on the CC list for the bug.