[Libre-soc-bugs] [Bug 236] Atomics Standard writeup needed

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri Jul 8 07:05:24 BST 2022


https://bugs.libre-soc.org/show_bug.cgi?id=236

Jacob Lifshay <programmerjake at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|CONFIRMED                   |IN_PROGRESS

--- Comment #11 from Jacob Lifshay <programmerjake at gmail.com> ---
After doing some research, it turns out that PowerISA actually already has a
lot of the atomic operations I was going to propose, they just aren't really
implemented in gcc or clang. They are still missing better fences, combined
operation/fence instructions, and operations on 8/16-bit values, as well as
issues with unnecessary restrictions.

PowerISA v3.1 Book II section 4.5: Atomic Memory Operations

it has only 32-bit and 64-bit atomic operations.

the operations it has that I was going to propose:
fetch_add
fetch_xor
fetch_or
fetch_and
fetch_umax
fetch_smax
fetch_umin
fetch_smin
exchange

as well as a few I wasn't going to propose (they seem less useful to me):
compare-and-swap-not-equal
fetch-and-increment-bounded
fetch-and-increment-equal
fetch-and-decrement-bounded
store-twin

The spec also basically says that the atomic memory operations are only
intended for when you want to do atomic operations on memory, but don't want
that memory to be loaded into your L1 cache.

imho that restriction is specifically *not* wanted, because there are plenty of
cases where atomic operations should happen in your L1 cache.

I'd guess that part of why those atomic operations weren't included in gcc or
clang as the default implementation of atomic operations (when the appropriate
ISA feature is enabled) is because of that restriction.

imho the cpu should be able to (but not required to) predict whether to send an
atomic operation to L2-cache/L3-cache/etc./memory or to execute it directly in
the L1 cache. The prediction could be based on how often that cache block was
accessed from different cpus, e.g. by having a small saturating counter and a
last-accessing-cpu field, where it would count how many times the same cpu
accessed it in a row, sending it to the L1 cache if that's more than some
limit, otherwise doing the operation in the L2/L3/etc.-cache if the limit
wasn't reached or a different cpu tried to access it.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list