Skip to content

Instantly share code, notes, and snippets.

@RaasAhsan
Last active June 16, 2023 06:37
Show Gist options
  • Save RaasAhsan/8e3554a41e07068536425ca0de46c9e8 to your computer and use it in GitHub Desktop.
Save RaasAhsan/8e3554a41e07068536425ca0de46c9e8 to your computer and use it in GitHub Desktop.
minimized ARM memory barrier violation
import java.util.concurrent.atomic.*;
import java.util.concurrent.*;
public class Main {
private static ExecutorService executor = Executors.newFixedThreadPool(2);
private static int iterations = 10000000;
public static class Runner {
// writes to canceled happen before a CAS on suspended
// reads on canceled happen after a CAS on suspended
private boolean canceled = false;
// an optimistic lock. false == locked, true == unlocked
private AtomicBoolean suspended = new AtomicBoolean(false);
private volatile boolean result = false;
private CountDownLatch latch = new CountDownLatch(1);
public void start() {
// start two tasks that synchronize on suspended and canceled
// this is a minimized version of a synchronization mechanism in cats effect
Future<?> f1 = executor.submit(() -> {
try {
latch.await();
} catch (Exception ex) {
ex.printStackTrace();
}
// assumption: this task already has the lock
// release the lock
suspended.compareAndSet(false, true);
if (canceled) {
// double-check, the other thread may have set canceled but failed the CAS check,
// so we'll try to reacquire the lock
if (suspended.compareAndSet(true, false)) {
result = true;
}
}
});
Future<?> f2 = executor.submit(() -> {
try {
latch.await();
} catch (Exception ex) {
ex.printStackTrace();
}
canceled = true;
// attempt to acquire the lock to set result.
// regardless of whether the CAS succeeds or not, the write to canceled should be published
if (suspended.compareAndSet(true, false)) {
result = true;
}
});
// signal threads to proceed
latch.countDown();
try {
// wait for tasks to complete
f1.get();
f2.get();
} catch (Exception ex) {
ex.printStackTrace();
}
// after both tasks have completed, result should be true
if (result != true) {
System.out.println("STUCK");
System.exit(0);
}
}
}
public static void main(String[] args) {
for (int i = 0; i < iterations; i++) {
Runner runner = new Runner();
runner.start();
}
System.exit(0);
}
}
@RaasAhsan
Copy link
Author

Thanks @simonis I'll take a deeper look later today, but the memory effects of atomics were precisely an assumption that we weren't sure was sound, and there seems to be a severe lack of elaboration on this point. I see many of your documentation links point to Java 11, but since we're mostly on Scala running Java 8, we've been referring to those docs.

In the atomics package documentation, it reads that "compareAndSet and all other read-and-update operations such as getAndIncrement have the memory effects of both reading and writing volatile variables." It's ambiguous whether the compareAndSet needs to succeed here for those memory effects to take place, but since it was left unspecified, we interpreted it to mean that that was always the case. It also lumps compareAndSet together with operations like getAndIncrement (which are CAS loops internally and so will eventually succeed), which suggested to us that the memory effects for every individual call should be consistent.

The Java 9 docs seem to make these details a bit more clear, so I'm more convinced now that our code is improperly synchronized

@mo-beck
Copy link

mo-beck commented Nov 11, 2020

+1 Volker (@simonis). You did an awesome job with your explanation. So, I won't do that in my reply here. But since I had already run these on my ThunderX2 system, I am attaching the below information for completeness. :)

Arm's supports LSE extensions since v8.1.
@RaasAhsan: I tried your minimized example as follows on a ThunderX2 system with and without LSE:

monica@c50-36-Ubun:~/projects/bmks/atomicbmk$ ../../jdks/jdk-16/bin/java -XX:-UseLSE Main
STUCK

the generated assembly here (without LSE) will look like this:
; - java.util.concurrent.atomic.AtomicBoolean::compareAndSet@22 (line 101)
0x0000ffff8a83dbd8: add x0, x1, #0xc
0x0000ffff8a83dbdc: ldaxr w8, [x0]
0x0000ffff8a83dbe0: cmp w8, w6
0x0000ffff8a83dbe4: b.ne 0x0000ffff8a83dbf0 // b.any
0x0000ffff8a83dbe8: stlxr w8, w7, [x0]
0x0000ffff8a83dbec: cbnz w8, 0x0000ffff8a83dbdc
0x0000ffff8a83dbf0: cset x8, ne // ne = any
0x0000ffff8a83dbf4: dmb ish``

And then you can try enabling LSE as shown here:
monica@c50-36-Ubun:~/projects/bmks/atomicbmk$ ../../jdks/jdk-16/bin/java -XX:+UseLSE Main
monica@c50-36-Ubun:~/projects/bmks/atomicbmk$
<note the lack of STUCK :)>

And the generated assembly again:

; - java.util.concurrent.atomic.AtomicBoolean::compareAndSet@22 (line 101)
0x0000ffff86847aa4: add x0, x1, #0xc
0x0000ffff86847aa8: mov x8, x6
0x0000ffff86847aac: casal w8, w7, [x0]
0x0000ffff86847ab0: cmp w8, w6
0x0000ffff86847ab4: cset x8, ne // ne = any
0x0000ffff86847ab8: dmb ish

@djspiewak
Copy link

Thank you all for the deep attention to this! You are all fantastic and this is incredibly helpful in so many ways.

@TheRealMDoerr
Copy link

canceled = true; if (suspended.compareAndSet(true, false))
is a Store-Load pattern. Nonvolatile Store is not ordered wrt. succeeding Volatile Load.
Should canceled not be volatile to fix this?

@djspiewak
Copy link

If canceled were volatile then this would work quite trivially. 😃 It would also be much much slower.

@TheRealMDoerr
Copy link

TheRealMDoerr commented Nov 12, 2020

If canceled were volatile then this would work quite trivially. 😃 It would also be much much slower.

Volatile has a price.
But without it, the JVM doesn't need to prevent reorderings as already explained above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment