Created
April 18, 2014 17:22
-
-
Save davidbauer/11055010 to your computer and use it in GitHub Desktop.
Python script to download images from a CSV of image urls
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# assuming a csv file with a name in column 0 and the image url in column 1 | |
import urllib | |
filename = "images" | |
# open file to read | |
with open("{0}.csv".format(filename), 'r') as csvfile: | |
# iterate on all lines | |
i = 0 | |
for line in csvfile: | |
splitted_line = line.split(',') | |
# check if we have an image URL | |
if splitted_line[1] != '' and splitted_line[1] != "\n": | |
urllib.urlretrieve(splitted_line[1], "img_" + str(i) + ".png") | |
print "Image saved for {0}".format(splitted_line[0]) | |
i += 1 | |
else: | |
print "No result for {0}".format(splitted_line[0]) |
This doesnt work i just get an error. Why isn't there something I can just double click, open the csv file and just have it poop out images?
how do I download these images in a subfolder in the main directory?
how do I download these images in a subfolder in the main directory?
you can create a subfolder and add the directory before the name of file, like this :
if splitted_line[1] != '' and splitted_line[1] != "\n":
urllib.urlretrieve(splitted_line[1], "download/" + splitted_line[0] + ".jpg")
This version has been updated to work with Python3, includes a subfolder of "images" where the files are saved, and uses a User-Agent to help avoid Forbidden Errors. Works off a file name of images.csv
#!/usr/bin/env python
# assuming a csv file with a name in column 0 and the image url in column 1
import urllib
import ntpath
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)
def path_leaf(path):
head, tail = ntpath.split(path)
return tail or ntpath.basename(head)
filename = "images"
# open file to read
with open("{0}.csv".format(filename), 'r') as csvfile:
# iterate on all lines
i = 0
for line in csvfile:
splitted_line = line.split(',')
img_filename = path_leaf(splitted_line[1])
# check if we have an image URL
if splitted_line[1] != '' and splitted_line[1] != "\n":
urllib.request.urlretrieve(splitted_line[1], "images/" + '{0}'.format(img_filename.rstrip("\r\n")))
print("Image saved for {0}".format(splitted_line[0]))
print("Filename: " + img_filename)
i += 1
else:
print("No result for {0}".format(splitted_line[0]))
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you so much, David!
I made the following updates for Python3 (3.8) as mentioned by noe2019 above.
Line 17 update from noe2019 had a typo and it should be urllib.request.urlretrieve(splitted_line[1], splitted_line[0] + ".jpg")
Line 18 and 21 need to change print to python3 as well. so, print(x)
I got the following error: urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1123)>
I found a way to fix it through stackoverflow, which explained and solved the error as follows
"If you have installed Python 3.6 on OSX and are getting the "SSL: CERTIFICATE_VERIFY_FAILED" error when trying to connect to an https:// site, it's probably because Python 3.6 on OSX has no certificates at all, and can't validate any SSL connections. This is a change for 3.6 on OSX, and requires a post-install step, which installs the certifi package of certificates. This is documented in the ReadMe, which you should find at /Applications/Python\ 3.8/ReadMe.rtf
The ReadMe will have you run this post-install script, which just installs
certifi: /Applications/Python\ 3.8/Install\ Certificates.command "