Over the years since I posted this script, it has become more and more common to scrape audio files off of Xeno-Canto.org. This has resulted in an overwhelming amount of traffic to their servers.
Please do not scrape Xeno-Canto without contacting the organizers first to ask for permission and for more information. They will be able to advise you on the best time of day to download data from their servers, or any alternative download options that are available.
Hi,
Great script and Idea!
I had two issues. in:
Make wget input file
url_list = []
for file in record_df['file'].tolist():
url_list.append('https:{}'.format(file))
with open('xc-noca-urls.txt', 'w+') as f:
for item in url_list:
f.write("{}\n".format(item))
I needed to exchange 'file' in "in record_df['file'].tolist():" tu 'URL'. Otherwise, it would have appended an additional unwanted "https".
Second, The Download seems to work for me, but the files are kind of corrupted. They have no filename extension. And whenn adding an .wav or .mp3, the files cannot be opened with raven.
Would be happy, if you could point me in the right direction.
Thanks, Dominik