Skip to content

Instantly share code, notes, and snippets.

@rhine3
Last active November 5, 2023 11:09
Show Gist options
  • Save rhine3/4829bf66381c7aa05c1f656cec4fa040 to your computer and use it in GitHub Desktop.
Save rhine3/4829bf66381c7aa05c1f656cec4fa040 to your computer and use it in GitHub Desktop.
Downloading files from Xeno-Canto

This script is no longer supported.

Over the years since I posted this script, it has become more and more common to scrape audio files off of Xeno-Canto.org. This has resulted in an overwhelming amount of traffic to their servers.

Please do not scrape Xeno-Canto without contacting the organizers first to ask for permission and for more information. They will be able to advise you on the best time of day to download data from their servers, or any alternative download options that are available.

@Richterskala101
Copy link

Heya, sorry for spamming, but thought I'd share how it worked out for me.

The previous comment was misleading...
When I'am just deleting the 'https: ' in the append() function like so:
url_list = []
for file in record_df['file'].tolist():
url_list.append('{}'.format(file))
with open('xc-noca-urls.txt', 'w+') as f:
for item in url_list:
f.write("{}\n".format(item))

the URL list text file works out great. That being said, I am completely unexperienced with python, so there are definitely more elegant ways.

Another thing which I stumbled upon, was that the downloaded recordings were renamed with arbitrary names.
exchanging "--trust-server-names" with "--content-disposition" preserved the original XC filenames.

maybe that's helpful for someone...

@rhine3
Copy link
Author

rhine3 commented Oct 9, 2023

Hi Dominik! To be honest, I should probably make this script private. It's from a while ago. Since I created it, it has become much more common to scrape Xeno-Canto and it is overwhelming their servers. So, the Xeno-Canto folks ask that you refrain from scraping it to the extent possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment