[Libre-soc-bugs] [Bug 236] Atomics Standard writeup needed

Sun Jun 26 08:32:39 BST 2022

https://bugs.libre-soc.org/show_bug.cgi?id=236

--- Comment #4 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #2)
> are these basically already covered by Power ISA load-store with
> reservations?

yes if you don't care about efficiency -- you'll just get large/slow functions
whenever you use atomics (for one atomic op: iirc >5 instructions with like 4
of them in a loop).

I'll assume efficiency is something we care about.

some atomics on powerpc64le, x86_64, and amdgpu:
https://gcc.godbolt.org/z/9a6EKjhjh

note how every atomic on powerpc64le is a giant pile of instructions in a loop
-- having the decoder need to macro-op fuse a 4 instruction (or more) loop is
absurd imho...x86 has a single instruction for add and for exchange (it has
more if you don't need the return value), amdgpu has dedicated instructions for
all the operations I tried (clang crashes for 8-bit atomics). riscv (not in
that godbolt link) also supports a bunch of operations.

we need short instructions for at least atomic-fetch-add and atomic-exchange
since they're quite common in cpu code, for gpu code it would be nice to
support the full list of atomic ops supported by vulkan/opencl:
https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_atomic_instructions

atomics supported by vulkan/opencl:
load float/int (already supported by power)
store float/int (already supported by power)
exchange float/int
compare exchange float/int (sufficiently supported by power)
fetch_increment int (covered by fetch_add int)
fetch_decrement int (covered by fetch_add int)
fetch_add int
fetch_sub int (covered by fetch_add int)
fetch_min[u] int
fetch_max[u] int
fetch_and int
fetch_or int
fetch_xor int
flag_test_and_set (covered by exchange int)
flag_clear (covered by store int)
fetch_min float 
fetch_max float
fetch_add float

int/float fetch_min/max are particularly important for gpu code since they can
be used for depth buffer ops.

we will want 8/16/32/64-bit int and 16/32/64-bit float support.

we also need 128-bit atomics support, they're relatively uncommon but used in
some critical data-structures and are waaayy faster than having to use a global
mutex, power's existing instructions are sufficient for that -- we just need to
implement them: lq, stq, lqarx, stqcx.

> 
> or the OpenCAPI atomic memory operations?

we need actual instructions to express what we want, otherwise all the fancy
hardware support is useless...

that pdf doesn't elaborate at all which atomics opencapi supports.

-- 
You are receiving this mail because:
You are on the CC list for the bug.