Skip to content

Instantly share code, notes, and snippets.

@darwing1210
Last active October 29, 2024 15:12
Show Gist options
  • Save darwing1210/c9ff8e3af8ba832e38e6e6e347d9047a to your computer and use it in GitHub Desktop.
Save darwing1210/c9ff8e3af8ba832e38e6e6e347d9047a to your computer and use it in GitHub Desktop.
Script to download files in a async way, using Python asyncio
import os
import asyncio
import aiohttp # pip install aiohttp
import aiofile # pip install aiofile
REPORTS_FOLDER = "reports"
FILES_PATH = os.path.join(REPORTS_FOLDER, "files")
def download_files_from_report(urls):
os.makedirs(FILES_PATH, exist_ok=True)
sema = asyncio.BoundedSemaphore(5)
async def fetch_file(session, url):
fname = url.split("/")[-1]
async with sema:
async with session.get(url) as resp:
assert resp.status == 200
data = await resp.read()
async with aiofile.async_open(
os.path.join(FILES_PATH, fname), "wb"
) as outfile:
await outfile.write(data)
async def main():
async with aiohttp.ClientSession() as session:
tasks = [fetch_file(session, url) for url in urls]
await asyncio.gather(*tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
@H4dr1en
Copy link

H4dr1en commented Jan 1, 2021

Use aiofile (not aiofiles) for better performances. aiofiles is using multithreading behind the scenes while aiofile uses platform native api on Linux/MacOS, thus being real async

@HanslettTheDev
Copy link

Brilliant code thanks

@SmyczekF
Copy link

Great piece of code!

@gabrielfreitash
Copy link

You should refactor to reuse the session, not creating one for each request.

@darwing1210
Copy link
Author

applied suggestions

@callumprentice
Copy link

Very elegant solution but after experimenting with it for a while, I have started getting errors while downloading files (never the same file and if you try enough times, the downloads for all files succeed):

aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Is the solution to change the code to catch that exception and try N times before giving up or is the root cause known?

@darwing1210
Copy link
Author

Very elegant solution but after experimenting with it for a while, I have started getting errors while downloading files (never the same file and if you try enough times, the downloads for all files succeed):

aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Is the solution to change the code to catch that exception and try N times before giving up or is the root cause known?

Please check aiohttp documentation about ClientPayloadError and yes, you can use aiohttp-retry to handle the failure cases

@Sarique-Deepsolv
Copy link

does this not work for video files?I am getting error 403.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment