-
-
Save RaasAhsan/8e3554a41e07068536425ca0de46c9e8 to your computer and use it in GitHub Desktop.
import java.util.concurrent.atomic.*; | |
import java.util.concurrent.*; | |
public class Main { | |
private static ExecutorService executor = Executors.newFixedThreadPool(2); | |
private static int iterations = 10000000; | |
public static class Runner { | |
// writes to canceled happen before a CAS on suspended | |
// reads on canceled happen after a CAS on suspended | |
private boolean canceled = false; | |
// an optimistic lock. false == locked, true == unlocked | |
private AtomicBoolean suspended = new AtomicBoolean(false); | |
private volatile boolean result = false; | |
private CountDownLatch latch = new CountDownLatch(1); | |
public void start() { | |
// start two tasks that synchronize on suspended and canceled | |
// this is a minimized version of a synchronization mechanism in cats effect | |
Future<?> f1 = executor.submit(() -> { | |
try { | |
latch.await(); | |
} catch (Exception ex) { | |
ex.printStackTrace(); | |
} | |
// assumption: this task already has the lock | |
// release the lock | |
suspended.compareAndSet(false, true); | |
if (canceled) { | |
// double-check, the other thread may have set canceled but failed the CAS check, | |
// so we'll try to reacquire the lock | |
if (suspended.compareAndSet(true, false)) { | |
result = true; | |
} | |
} | |
}); | |
Future<?> f2 = executor.submit(() -> { | |
try { | |
latch.await(); | |
} catch (Exception ex) { | |
ex.printStackTrace(); | |
} | |
canceled = true; | |
// attempt to acquire the lock to set result. | |
// regardless of whether the CAS succeeds or not, the write to canceled should be published | |
if (suspended.compareAndSet(true, false)) { | |
result = true; | |
} | |
}); | |
// signal threads to proceed | |
latch.countDown(); | |
try { | |
// wait for tasks to complete | |
f1.get(); | |
f2.get(); | |
} catch (Exception ex) { | |
ex.printStackTrace(); | |
} | |
// after both tasks have completed, result should be true | |
if (result != true) { | |
System.out.println("STUCK"); | |
System.exit(0); | |
} | |
} | |
} | |
public static void main(String[] args) { | |
for (int i = 0; i < iterations; i++) { | |
Runner runner = new Runner(); | |
runner.start(); | |
} | |
System.exit(0); | |
} | |
} |
Thanks @simonis I'll take a deeper look later today, but the memory effects of atomics were precisely an assumption that we weren't sure was sound, and there seems to be a severe lack of elaboration on this point. I see many of your documentation links point to Java 11, but since we're mostly on Scala running Java 8, we've been referring to those docs.
In the atomics package documentation, it reads that "compareAndSet and all other read-and-update operations such as getAndIncrement have the memory effects of both reading and writing volatile variables." It's ambiguous whether the compareAndSet
needs to succeed here for those memory effects to take place, but since it was left unspecified, we interpreted it to mean that that was always the case. It also lumps compareAndSet
together with operations like getAndIncrement
(which are CAS loops internally and so will eventually succeed), which suggested to us that the memory effects for every individual call should be consistent.
The Java 9 docs seem to make these details a bit more clear, so I'm more convinced now that our code is improperly synchronized
+1 Volker (@simonis). You did an awesome job with your explanation. So, I won't do that in my reply here. But since I had already run these on my ThunderX2 system, I am attaching the below information for completeness. :)
Arm's supports LSE extensions since v8.1.
@RaasAhsan: I tried your minimized example as follows on a ThunderX2 system with and without LSE:
monica@c50-36-Ubun:~/projects/bmks/atomicbmk$ ../../jdks/jdk-16/bin/java -XX:-UseLSE Main
STUCK
the generated assembly here (without LSE) will look like this:
; - java.util.concurrent.atomic.AtomicBoolean::compareAndSet@22 (line 101)
0x0000ffff8a83dbd8: add x0, x1, #0xc
0x0000ffff8a83dbdc: ldaxr w8, [x0]
0x0000ffff8a83dbe0: cmp w8, w6
0x0000ffff8a83dbe4: b.ne 0x0000ffff8a83dbf0 // b.any
0x0000ffff8a83dbe8: stlxr w8, w7, [x0]
0x0000ffff8a83dbec: cbnz w8, 0x0000ffff8a83dbdc
0x0000ffff8a83dbf0: cset x8, ne // ne = any
0x0000ffff8a83dbf4: dmb
ish``
And then you can try enabling LSE as shown here:
monica@c50-36-Ubun:~/projects/bmks/atomicbmk$ ../../jdks/jdk-16/bin/java -XX:+UseLSE Main
monica@c50-36-Ubun:~/projects/bmks/atomicbmk$
<note the lack of STUCK :)>
And the generated assembly again:
; - java.util.concurrent.atomic.AtomicBoolean::compareAndSet@22 (line 101)
0x0000ffff86847aa4: add x0, x1, #0xc
0x0000ffff86847aa8: mov x8, x6
0x0000ffff86847aac: casal w8, w7, [x0]
0x0000ffff86847ab0: cmp w8, w6
0x0000ffff86847ab4: cset x8, ne // ne = any
0x0000ffff86847ab8: dmb ish
Thank you all for the deep attention to this! You are all fantastic and this is incredibly helpful in so many ways.
canceled = true; if (suspended.compareAndSet(true, false))
is a Store-Load pattern. Nonvolatile Store is not ordered wrt. succeeding Volatile Load.
Should canceled
not be volatile to fix this?
If canceled were volatile then this would work quite trivially. 😃 It would also be much much slower.
If canceled were volatile then this would work quite trivially. 😃 It would also be much much slower.
Volatile has a price.
But without it, the JVM doesn't need to prevent reorderings as already explained above.
I don't think this program demonstrates any "memory barrier violation" on ARM. Printing "STUCK" is a valid execution path according to the current Java Memory Model and API specification.
The JMM was revised by JSR-133 starting with JDK 1.5. JSR 133 considerably strengthened the semantics of
volatile
fields. The following section from the JSR-133 FAQ is relevant for this discussion:The following code example from the same source illustrates what this means:
I.e. when
v == true
inreader()
the JMM guarantees thatx
will have the value42
. This wasn't guaranteed before the JMM update by JSR-133.Now lets have a look at the API specification of
AtomicBoolean.compareAndSet(boolean expectedValue, boolean newValue)
:The if is important here. Let's see what are the memory effects specified by VarHandle.compareAndSet(java.lang.Object...):
Checking the specification of VarHandle.getVolatile() and VarHandle.setVolatile we find that they "return the value of a variable, with memory semantics of reading" or "set the value of a variable to the newValue, with memory semantics of setting" as if "the variable was declared volatile".
This means that
AtomicBoolean.compareAndSet()
only behaves as a write to a volatile field if the preceding volatile read of that fieled returned a value equal toexpectedValue
. In other words, if the volatile read of the respective field isn't equal toexpectedValue
,AtomicBoolean.compareAndSet()
will behave like a simple volatile read and won't exhibit the effects which are guaranteed for "volatile writes" by the JMM.In the contect of your example, this means that in contrast to your comment, the following code sequence in
Future
f2
:will not publish the write to
canceled
to other threads in cases wheresuspended.compareAndSet()
fails because in such casescompareAndSet()
degenerates to a simple volatile read which according to the JMM only has acquire but not release semantics.It is therefore very well possible that
Future
f1
, running in another thread, will successfully execute thesuspended.compareAndSet(false, true)
operation but still not see the updated value ofcanceled
:Implementation-wise I could see your test program printing "STUCK" when running on Graviton 1 but not on Graviton 2. The reason for this difference is that Graviton 1 doesn't support the Large System Extensions (LSE) introduced in ARMv8.1. Without LSE, a call to
AtomicBoolean.compareAndSet()
translates into the following machine code:It uses a pair of LDAXR (Load-Acquire Exclusive)/STLXR (Store-Release Exclusive) instructions to implement the CAS. With this implementation, if the loaded value fails to equal to the expected value the store together with its release semantics is skipped (which is in total consensus with
AtomicBoolean.compareAndSet()
's specification linked before).On newer Graviton 2 CPU's which implement ARMv8.2 and support LSE, the following code will be generated instead:
Here the LDAXR/STLXR pair is replaced by the new CASAL instruction which is faster and which has acquire/release semantics
no matter if the CAS succeeds or not. That's the reason why your test seems to work on Graviton 2. But it can be easily made printing "STUCK" on Graviton 2 as well if we forcibly disable the usage of LSE instructions with the-XX:-UseLSE
option.On a side node, I can report that I've seen your test exting with "STUCK" on Power 8 as well. Power 8 also uses a pair of load/store exclusive instructions for implementing CAS and also has a weak memory model.
Update 2020-11-19: it seems that what I wrote before about the CASAL instruction is not totally true.
The "ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile" notes the following about memory barriers in chapter B2.3.10 on page B2-139:
Later, in section C3.2.12 about Atomic instructions on page C3-222 it notes the following about Compare and Swap:
I don't know why your example succeeds on Graviton 2 with CASAL. It might be that the actual chip implementation is stricter than the specification and still releases even if the store part of a CASAL wasn't executed. But it might be just as well that the "STUCK" output became just much more unlikely.
Notice that on x86_64 if you are prefixing a CMPXCHG with LOCK, you are guaranteed to get the release semantics of the write in any case: