Created
November 1, 2024 08:36
-
-
Save mattip/7eb1c97169b00d844dff5a2f30a27d0a to your computer and use it in GitHub Desktop.
Explore 3.11 bzip test hang in multiple threads
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The problematic place in the code is in the compression, when the test does compression in many threads: | |
``` | |
def testThreading(self): | |
# Issue #7205: Using a BZ2File from several threads shouldn't deadlock. | |
data = b"1" * 2**20 | |
nthreads = 2 | |
with BZ2File(self.filename, 'wb') as f: | |
def comp(): | |
for i in range(5): | |
# f.write(data) # original test does this, change for debugging | |
print("start compression", tid) | |
f._compressor.compress(data) | |
print("done compression", tid) | |
threads = [threading.Thread(target=comp) for i in range(nthreads)] | |
with threading_helper.start_threads(threads): | |
pass | |
``` | |
Here is the relevant code in the compressor: | |
``` | |
def compress(self, data): | |
assert data is not None | |
try: | |
self.lock() | |
... # do the compression | |
finally: | |
self.unlock() | |
``` | |
Here is the lock/unlock (with debug logging): | |
``` | |
def lock(self): | |
ident = get_ident() | |
print("lock 288", ident) | |
if not self._lock.acquire(False): | |
print("lock 290 me:", ident, "holder:", rgil.gil_get_holder()) | |
rgil.release() | |
print("lock 292 me:", ident, "holder:", rgil.gil_get_holder()) | |
self._lock.acquire(True) | |
print("lock 294 me:", ident, "holder:", rgil.gil_get_holder()) | |
rgil.acquire() | |
print("lock 296", ident) | |
def unlock(self): | |
ident = get_ident() | |
print("unlock 299", ident) | |
self._lock.release() | |
print("unlock 301", ident) | |
``` | |
And here is what py3.11 prints before hanging | |
``` | |
(lock 288, 140290695381504) | |
(lock 296, 140290695381504) | |
(unlock 299, 140290695381504) | |
(unlock 301, 140290695381504) | |
(lock 288, 140290695381504) | |
(lock 296, 140290695381504) | |
(unlock 299, 140290695381504) | |
(unlock 301, 140290695381504) | |
start compression 140290676360768 | |
(lock 288, 140290676360768) | |
(lock 296, 140290676360768) | |
start compression 140290665338432 | |
(lock 288, 140290665338432) | |
(lock 290 me:, 140290665338432, holder:, 140290665338432) | |
(lock 292 me:, 140290665338432, holder:, 0) | |
(unlock 299, 140290676360768) | |
(unlock 301, 140290676360768) | |
done compression 140290676360768 | |
(lock 294 me:, 140290665338432, holder:, 140290665338432) <- whoops, calling rgil.acquire() when it already holds it! | |
start compression 140290676360768 | |
(lock 288, 140290676360768) | |
(lock 290 me:, 140290676360768, holder:, 140290676360768) | |
(lock 292 me:, 140290676360768, holder:, 0) | |
^Z | |
``` | |
The part that "should not happen" is the line where a thread calls rgil.acquire() when it already holds it |
Is something else acting in the name of the thread?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There are three identifiable threads involved: main is 139742332699136, one compression thread is 139742304814656, the other is 139742293792320. As far as I can tell, the GC thread or signaling thread is not involved. Working backwards, the second to the last line is at the end of a GilAcquire. Where is its pair? About 16 lines up, where the holder is the other compression thread.
Not all the acquire/release cycles are due to the compression code. Who else is releaseing/holding the GIL?