Skip to content

Instantly share code, notes, and snippets.

@mattip
Created November 1, 2024 08:36
Show Gist options
  • Save mattip/7eb1c97169b00d844dff5a2f30a27d0a to your computer and use it in GitHub Desktop.
Save mattip/7eb1c97169b00d844dff5a2f30a27d0a to your computer and use it in GitHub Desktop.
Explore 3.11 bzip test hang in multiple threads
The problematic place in the code is in the compression, when the test does compression in many threads:
```
def testThreading(self):
# Issue #7205: Using a BZ2File from several threads shouldn't deadlock.
data = b"1" * 2**20
nthreads = 2
with BZ2File(self.filename, 'wb') as f:
def comp():
for i in range(5):
# f.write(data) # original test does this, change for debugging
print("start compression", tid)
f._compressor.compress(data)
print("done compression", tid)
threads = [threading.Thread(target=comp) for i in range(nthreads)]
with threading_helper.start_threads(threads):
pass
```
Here is the relevant code in the compressor:
```
def compress(self, data):
assert data is not None
try:
self.lock()
... # do the compression
finally:
self.unlock()
```
Here is the lock/unlock (with debug logging):
```
def lock(self):
ident = get_ident()
print("lock 288", ident)
if not self._lock.acquire(False):
print("lock 290 me:", ident, "holder:", rgil.gil_get_holder())
rgil.release()
print("lock 292 me:", ident, "holder:", rgil.gil_get_holder())
self._lock.acquire(True)
print("lock 294 me:", ident, "holder:", rgil.gil_get_holder())
rgil.acquire()
print("lock 296", ident)
def unlock(self):
ident = get_ident()
print("unlock 299", ident)
self._lock.release()
print("unlock 301", ident)
```
And here is what py3.11 prints before hanging
```
(lock 288, 140290695381504)
(lock 296, 140290695381504)
(unlock 299, 140290695381504)
(unlock 301, 140290695381504)
(lock 288, 140290695381504)
(lock 296, 140290695381504)
(unlock 299, 140290695381504)
(unlock 301, 140290695381504)
start compression 140290676360768
(lock 288, 140290676360768)
(lock 296, 140290676360768)
start compression 140290665338432
(lock 288, 140290665338432)
(lock 290 me:, 140290665338432, holder:, 140290665338432)
(lock 292 me:, 140290665338432, holder:, 0)
(unlock 299, 140290676360768)
(unlock 301, 140290676360768)
done compression 140290676360768
(lock 294 me:, 140290665338432, holder:, 140290665338432) <- whoops, calling rgil.acquire() when it already holds it!
start compression 140290676360768
(lock 288, 140290676360768)
(lock 290 me:, 140290676360768, holder:, 140290676360768)
(lock 292 me:, 140290676360768, holder:, 0)
^Z
```
The part that "should not happen" is the line where a thread calls rgil.acquire() when it already holds it
@mattip
Copy link
Author

mattip commented Nov 1, 2024

Maybe the print statements are messing with the GIL? Trying another version using fprintf at the start and end of the C code of RPyGilAcquire and RPyGilRelease` prints this:

tarting test, main thread 139742332699136
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Acquire start ident 139742304814656 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire done ident 139742304814656 holder 139742304814656
Release start ident 139742304814656 holder 139742304814656
Release done ident 139742304814656 holder 0
Acquire start ident 139742304814656 holder 0
Acquire done ident 139742304814656 holder 139742304814656
Release start ident 139742304814656 holder 139742304814656
Release done ident 139742304814656 holder 0
Acquire start ident 139742304814656 holder 0
Acquire done ident 139742304814656 holder 139742304814656
Release start ident 139742304814656 holder 139742304814656
Release done ident 139742304814656 holder 0
Acquire start ident 139742304814656 holder 0
Acquire done ident 139742304814656 holder 139742304814656
Release start ident 139742304814656 holder 139742304814656
Release done ident 139742304814656 holder 0
Acquire start ident 139742304814656 holder 0
Acquire done ident 139742304814656 holder 139742304814656
Acquire start ident 139742332699136 holder 139742304814656
Release start ident 139742304814656 holder 139742304814656
Release done ident 139742304814656 holder 0
start compression 139742304814656
Acquire start ident 139742304814656 holder 0
Acquire done ident 139742304814656 holder 139742304814656
Release start ident 139742304814656 holder 139742304814656
Release done ident 139742304814656 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742293792320 holder 139742332699136
Acquire done ident 139742293792320 holder 139742293792320
Release start ident 139742293792320 holder 139742293792320
Release done ident 139742293792320 holder 0
Acquire start ident 139742293792320 holder 0
Acquire done ident 139742293792320 holder 139742293792320
Release start ident 139742293792320 holder 139742293792320
Release done ident 139742293792320 holder 0
Acquire start ident 139742293792320 holder 0
Acquire done ident 139742293792320 holder 139742293792320
Release start ident 139742293792320 holder 139742293792320
Release done ident 139742293792320 holder 0
Acquire start ident 139742293792320 holder 0
Acquire done ident 139742293792320 holder 139742293792320
Release start ident 139742293792320 holder 139742293792320
Release done ident 139742293792320 holder 0
Acquire start ident 139742293792320 holder 0
Acquire done ident 139742293792320 holder 139742293792320
Acquire start ident 139742332699136 holder 139742293792320
Release start ident 139742293792320 holder 139742293792320
Release done ident 139742293792320 holder 0
start compression 139742293792320
Acquire start ident 139742293792320 holder 0
Acquire done ident 139742293792320 holder 139742293792320
Release start ident 139742293792320 holder 139742293792320
Release done ident 139742293792320 holder 0
Release start ident 139742293792320 holder 0
Release done ident 139742293792320 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742332699136 holder 0
Acquire done ident 139742332699136 holder 139742332699136
Release start ident 139742332699136 holder 139742332699136
Release done ident 139742332699136 holder 0
Acquire start ident 139742304814656 holder 0
Acquire done ident 139742304814656 holder 139742304814656
Acquire start ident 139742293792320 holder 139742304814656
Release start ident 139742304814656 holder 139742304814656
Release done ident 139742304814656 holder 0
done compression 139742304814656
Acquire start ident 139742304814656 holder 0
Acquire done ident 139742304814656 holder 139742304814656
Release start ident 139742304814656 holder 139742304814656
Release done ident 139742304814656 holder 0
start compression 139742304814656
Acquire start ident 139742304814656 holder 0
Acquire done ident 139742304814656 holder 139742304814656
Release start ident 139742304814656 holder 139742304814656
Release done ident 139742304814656 holder 0
Release start ident 139742304814656 holder 0
Release done ident 139742304814656 holder 0
Acquire done ident 139742293792320 holder 139742293792320
Acquire start ident 139742293792320 holder 139742293792320

The last line is the only one with Acquire start and the thread ident is the same as the GIL holder.

@mattip
Copy link
Author

mattip commented Nov 1, 2024

There are three identifiable threads involved: main is 139742332699136, one compression thread is 139742304814656, the other is 139742293792320. As far as I can tell, the GC thread or signaling thread is not involved. Working backwards, the second to the last line is at the end of a GilAcquire. Where is its pair? About 16 lines up, where the holder is the other compression thread.

Not all the acquire/release cycles are due to the compression code. Who else is releaseing/holding the GIL?

@mattip
Copy link
Author

mattip commented Nov 1, 2024

Is something else acting in the name of the thread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment