rahulrajaram/.md

Last active November 26, 2024 08:23

Star (19) You must be signed in to star a gist
Fork (9) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/rahulrajaram/5934d2b786ed2c29dc418fafaa2830ad.js"></script>
Save rahulrajaram/5934d2b786ed2c29dc418fafaa2830ad to your computer and use it in GitHub Desktop.

Python: Write to a file from multiple threads

Raw

I recently came across the need to spawn multiple threads, each of which needs to write to the same file. Since the file will experience contention from multiple resources, we need to guarantee thread-safety.

NOTE: The following examples work with Python 3.x. To execute the following programs using Python 2.7, please replace threading.get_ident() with thread.get_ident(). As a result, you would need to import thread and not threading.

(The following example will take a very long time). It will create 200 threads, each of which will wait until a global lock is available for acquisition.

# threading_lock.py
import threading

global_lock = threading.Lock()

def write_to_file():
    while global_lock.locked():
        continue

    global_lock.acquire()

    with open("thread_writes", "a+") as file:
        file.write(str(threading.get_ident()))
        file.write("\n")
        file.close()

    global_lock.release()

# Create a 200 threads, invoke write_to_file() through each of them,
# and 
threads = []
for i in range(1, 201):
    t = threading.Thread(target=write_to_file)
    threads.append(t)
    t.start()
[thread.join() for thread in threads]

As mentioned earlier, the above program takes an unacceptable 125s:

python threading_lock.py  125.56s user 0.34s system 103% cpu 2:01.57 total

(Addendum: @agiletelescope points out that with the following minor change, we can avoid lock-contention drastically.

...
while global_lock.locked():
    time.sleep(0.01)
    continue

)

A simple modification to this is to store the information the threads want to write in an in-memory data structure such as a Python list and to write the contents of the list to a file once all threads have join-ed.

# threading_lock_2.py
import threading

# Global lock
global_lock = threading.Lock()
file_contents = []
def write_to_file():
    while global_lock.locked():
        continue

    global_lock.acquire()
    file_contents.append(threading.get_ident())
    global_lock.release()

# Create a 200 threads, invoke write_to_file() through each of them,
# and 
threads = []
for i in range(1, 201):
    t = threading.Thread(target=write_to_file)
    threads.append(t)
    t.start()
[thread.join() for thread in threads]

with open("thread_writes", "a+") as file:
    file.write('\n'.join([str(content) for content in file_contents]))
    file.close()

The above program takes a significantly shorter, and almost negligible, time:

python threading_lock_2.py  0.04s user 0.00s system 76% cpu 0.052 total

With thread count = 2000:

python threading_lock_2.py  0.10s user 0.06s system 77% cpu 0.206 total

With thread count = 20000:

python threading_lock_2.py  0.10s user 0.06s system 77% cpu 0.206 total

mbanders commented Jul 19, 2019

Your second to last line has file.write('\n'.join()) but doesn't join need an argument?

>>> '\n'.join()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: join() takes exactly one argument (0 given)

loganathanengrr commented Jul 30, 2019

You just pass the argument here join function expect the argument like this .join("some value")

tonykwok commented Aug 10, 2019 •

edited

Loading

It seems like file.write('\n'.join()) should be file.write('\n'.join(file_contents))

Author

rahulrajaram commented Aug 12, 2019 •

edited

Loading

Hi everyone, yes, you are right. I meant to .join stringified file_contents.

agiletelescope commented Nov 14, 2019

Hey, thanks a lot for the first script.
The execution speed of the first script can be greatly increased by introducing a sleep when waiting for the lock to be free (this prevents the threads from accessing the lock too frequently), was able to get a time of around 0.06s, attached the script below.
Thank You.

import threading
from time import sleep
from datetime import datetime

global_lock = threading.Lock()

def write_to_file():
    while global_lock.locked():
        sleep(0.01)
        continue

    global_lock.acquire()

    with open("thread_writes", "a+") as file:
        file.write(str(threading.get_ident()))
        file.write("\n")
        file.close()

    global_lock.release()

# Create a 200 threads, invoke write_to_file() through each of them,
# and 
threads = []
st = datetime.now()

for i in range(1, 201):
    print (i)
    t = threading.Thread(target=write_to_file)
    threads.append(t)
    t.start()
[thread.join() for thread in threads]

nd = datetime.now()
print ("Ex time: ", (nd - st).total_seconds())

Author

rahulrajaram commented Apr 19, 2020

@agiletelescope, awesome! Thanks.

mgirard772 commented May 5, 2020

How do you get the output for user, system, cpu and total time like displayed above?

Author

rahulrajaram commented Jun 27, 2020

@mgirard772

Use the Linux/BSD time facility:

time python <python script>

S0Ulle33 commented Jul 16, 2020 •

edited

Loading

@agiletelescope, it's even better if you use global_lock with context manager with; and don't call file.close() in context manager:

import threading
from time import sleep
from datetime import datetime

global_lock = threading.Lock()


def write_to_file():
    with global_lock:
        with open("thread_writes", "a") as file:
            file.write(str(threading.get_ident()))
            file.write("\n")


# Create a 200 threads, invoke write_to_file() through each of them,
# and
threads = []
st = datetime.now()

for i in range(200):
    t = threading.Thread(target=write_to_file)
    threads.append(t)
    t.start()
[thread.join() for thread in threads]

nd = datetime.now()
print("Ex time: ", (nd - st).total_seconds())

If you need only append, then there's no point in a+, just use a. This also speed up overall perfomance.

jin09 commented Apr 30, 2021

Hey, @rahulrajaram I think the entire problem with your 1st script is that you are continuously polling the global lock which is why your threads are not going to sleep when the lock is unavailable and hence 100% CPU strain. You can simply avoid it by using a context manager which will automatically take care of sleeping and waking up your threads.
@agiletelescope this should give even better performance compared to the 0.1s sleep that you added which might completely be unnecessary.
Also, you are continuously polling the lock in the queue example (2nd script) that you have shared, it takes less time compared to your 1st script because list operations are fast compared to the file operations in the critical section, but the problem of lock contention still persists which is why you see a high CPU strain.

rahulrajaram/.md

Select an option

No results found

Select an option

No results found

mbanders commented Jul 19, 2019

Uh oh!

loganathanengrr commented Jul 30, 2019

Uh oh!

tonykwok commented Aug 10, 2019 •

edited

Loading

Uh oh!

rahulrajaram commented Aug 12, 2019 •

edited

Loading

Uh oh!

agiletelescope commented Nov 14, 2019

Uh oh!

rahulrajaram commented Apr 19, 2020

Uh oh!

mgirard772 commented May 5, 2020

Uh oh!

rahulrajaram commented Jun 27, 2020

Uh oh!

S0Ulle33 commented Jul 16, 2020 •

edited

Loading

Uh oh!

jin09 commented Apr 30, 2021

Uh oh!

rahulrajaram/.md

mbanders commented Jul 19, 2019

Uh oh!

loganathanengrr commented Jul 30, 2019

Uh oh!

tonykwok commented Aug 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rahulrajaram commented Aug 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agiletelescope commented Nov 14, 2019

Uh oh!

rahulrajaram commented Apr 19, 2020

Uh oh!

mgirard772 commented May 5, 2020

Uh oh!

rahulrajaram commented Jun 27, 2020

Uh oh!

S0Ulle33 commented Jul 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jin09 commented Apr 30, 2021

Uh oh!

tonykwok commented Aug 10, 2019 •

edited

Loading

rahulrajaram commented Aug 12, 2019 •

edited

Loading

S0Ulle33 commented Jul 16, 2020 •

edited

Loading