-
-
Save davidbauer/11055010 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python | |
# assuming a csv file with a name in column 0 and the image url in column 1 | |
import urllib | |
filename = "images" | |
# open file to read | |
with open("{0}.csv".format(filename), 'r') as csvfile: | |
# iterate on all lines | |
i = 0 | |
for line in csvfile: | |
splitted_line = line.split(',') | |
# check if we have an image URL | |
if splitted_line[1] != '' and splitted_line[1] != "\n": | |
urllib.urlretrieve(splitted_line[1], "img_" + str(i) + ".png") | |
print "Image saved for {0}".format(splitted_line[0]) | |
i += 1 | |
else: | |
print "No result for {0}".format(splitted_line[0]) |
Thanks, very useful.
change line 17, now filenames will have 3 digits numbers, 000-009,010-100 and etc, it's important for arrange by names
Thank you for the help.
Quite useful 👍
Thanks David!
This was very useful. Just need to update for python3x users
import urllib.request in line 5 and urllib.request.retieve(splitted_line[1], "img_" + str(i) + ".png")) in line 17.
Thank you so much, David!
I made the following updates for Python3 (3.8) as mentioned by noe2019 above.
Line 17 update from noe2019 had a typo and it should be urllib.request.urlretrieve(splitted_line[1], splitted_line[0] + ".jpg")
Line 18 and 21 need to change print to python3 as well. so, print(x)
I got the following error: urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1123)>
I found a way to fix it through stackoverflow, which explained and solved the error as follows
"If you have installed Python 3.6 on OSX and are getting the "SSL: CERTIFICATE_VERIFY_FAILED" error when trying to connect to an https:// site, it's probably because Python 3.6 on OSX has no certificates at all, and can't validate any SSL connections. This is a change for 3.6 on OSX, and requires a post-install step, which installs the certifi package of certificates. This is documented in the ReadMe, which you should find at /Applications/Python\ 3.8/ReadMe.rtf
The ReadMe will have you run this post-install script, which just installs
certifi: /Applications/Python\ 3.8/Install\ Certificates.command "
This doesnt work i just get an error. Why isn't there something I can just double click, open the csv file and just have it poop out images?
how do I download these images in a subfolder in the main directory?
how do I download these images in a subfolder in the main directory?
you can create a subfolder and add the directory before the name of file, like this :
if splitted_line[1] != '' and splitted_line[1] != "\n":
urllib.urlretrieve(splitted_line[1], "download/" + splitted_line[0] + ".jpg")
This version has been updated to work with Python3, includes a subfolder of "images" where the files are saved, and uses a User-Agent to help avoid Forbidden Errors. Works off a file name of images.csv
#!/usr/bin/env python
# assuming a csv file with a name in column 0 and the image url in column 1
import urllib
import ntpath
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)
def path_leaf(path):
head, tail = ntpath.split(path)
return tail or ntpath.basename(head)
filename = "images"
# open file to read
with open("{0}.csv".format(filename), 'r') as csvfile:
# iterate on all lines
i = 0
for line in csvfile:
splitted_line = line.split(',')
img_filename = path_leaf(splitted_line[1])
# check if we have an image URL
if splitted_line[1] != '' and splitted_line[1] != "\n":
urllib.request.urlretrieve(splitted_line[1], "images/" + '{0}'.format(img_filename.rstrip("\r\n")))
print("Image saved for {0}".format(splitted_line[0]))
print("Filename: " + img_filename)
i += 1
else:
print("No result for {0}".format(splitted_line[0]))
Thanks for this, very useful
Changed line 17 urllib.urlretrieve(splitted_line[1], splitted_line[0] + ".png") As I had the name in column 0 but your code was producing file names of img_0.png as your code was referencing str(i)