I recently came across the need to spawn multiple threads, each of which needs to write to the same file. Since the file will experience contention from multiple resources, we need to guarantee thread-safety.
NOTE: The following examples work with Python 3.x. To execute the following programs using Python 2.7, please replace threading.get_ident()
with thread.get_ident()
. As a result, you would need to import thread
and not threading
.
- (The following example will take a very long time). It will create 200 threads, each of which will wait until a global lock is available for acquisition.
# threading_lock.py
import threading
global_lock = threading.Lock()
def write_to_file():
while global_lock.locked():
continue
global_lock.acquire()
with open("thread_writes", "a+") as file:
file.write(str(threading.get_ident()))
file.write("\n")
file.close()
global_lock.release()
# Create a 200 threads, invoke write_to_file() through each of them,
# and
threads = []
for i in range(1, 201):
t = threading.Thread(target=write_to_file)
threads.append(t)
t.start()
[thread.join() for thread in threads]
As mentioned earlier, the above program takes an unacceptable 125s:
python threading_lock.py 125.56s user 0.34s system 103% cpu 2:01.57 total
(Addendum: @agiletelescope points out that with the following minor change, we can avoid lock-contention drastically.
...
while global_lock.locked():
time.sleep(0.01)
continue
)
- A simple modification to this is to store the information the threads want to write in an in-memory data structure such as a Python
list
and to write the contents of thelist
to a file once all threads havejoin
-ed.
# threading_lock_2.py
import threading
# Global lock
global_lock = threading.Lock()
file_contents = []
def write_to_file():
while global_lock.locked():
continue
global_lock.acquire()
file_contents.append(threading.get_ident())
global_lock.release()
# Create a 200 threads, invoke write_to_file() through each of them,
# and
threads = []
for i in range(1, 201):
t = threading.Thread(target=write_to_file)
threads.append(t)
t.start()
[thread.join() for thread in threads]
with open("thread_writes", "a+") as file:
file.write('\n'.join([str(content) for content in file_contents]))
file.close()
The above program takes a significantly shorter, and almost negligible, time:
python threading_lock_2.py 0.04s user 0.00s system 76% cpu 0.052 total
With thread count = 2000:
python threading_lock_2.py 0.10s user 0.06s system 77% cpu 0.206 total
With thread count = 20000:
python threading_lock_2.py 0.10s user 0.06s system 77% cpu 0.206 total
Your second to last line has
file.write('\n'.join())
but doesn't join need an argument?