I recently came across the need to spawn multiple threads, each of which needs to write to the same file. Since the file will experience contention from multiple resources, we need to guarantee thread-safety.
NOTE: The following examples work with Python 3.x. To execute the following programs using Python 2.7, please replace threading.get_ident()
with thread.get_ident()
. As a result, you would need to import thread
and not threading
.
- (The following example will take a very long time). It will create 200 threads, each of which will wait until a global lock is available for acquisition.
# threading_lock.py
import threading
global_lock = threading.Lock()
def write_to_file():
while global_lock.locked():
continue
global_lock.acquire()
with open("thread_writes", "a+") as file:
file.write(str(threading.get_ident()))
file.write("\n")
file.close()
global_lock.release()
# Create a 200 threads, invoke write_to_file() through each of them,
# and
threads = []
for i in range(1, 201):
t = threading.Thread(target=write_to_file)
threads.append(t)
t.start()
[thread.join() for thread in threads]
As mentioned earlier, the above program takes an unacceptable 125s:
python threading_lock.py 125.56s user 0.34s system 103% cpu 2:01.57 total
(Addendum: @agiletelescope points out that with the following minor change, we can avoid lock-contention drastically.
...
while global_lock.locked():
time.sleep(0.01)
continue
)
- A simple modification to this is to store the information the threads want to write in an in-memory data structure such as a Python
list
and to write the contents of thelist
to a file once all threads havejoin
-ed.
# threading_lock_2.py
import threading
# Global lock
global_lock = threading.Lock()
file_contents = []
def write_to_file():
while global_lock.locked():
continue
global_lock.acquire()
file_contents.append(threading.get_ident())
global_lock.release()
# Create a 200 threads, invoke write_to_file() through each of them,
# and
threads = []
for i in range(1, 201):
t = threading.Thread(target=write_to_file)
threads.append(t)
t.start()
[thread.join() for thread in threads]
with open("thread_writes", "a+") as file:
file.write('\n'.join([str(content) for content in file_contents]))
file.close()
The above program takes a significantly shorter, and almost negligible, time:
python threading_lock_2.py 0.04s user 0.00s system 76% cpu 0.052 total
With thread count = 2000:
python threading_lock_2.py 0.10s user 0.06s system 77% cpu 0.206 total
With thread count = 20000:
python threading_lock_2.py 0.10s user 0.06s system 77% cpu 0.206 total
Hey, @rahulrajaram I think the entire problem with your 1st script is that you are continuously polling the global lock which is why your threads are not going to sleep when the lock is unavailable and hence 100% CPU strain. You can simply avoid it by using a context manager which will automatically take care of sleeping and waking up your threads.
@agiletelescope this should give even better performance compared to the 0.1s sleep that you added which might completely be unnecessary.
Also, you are continuously polling the lock in the queue example (2nd script) that you have shared, it takes less time compared to your 1st script because list operations are fast compared to the file operations in the critical section, but the problem of lock contention still persists which is why you see a high CPU strain.