Created
April 18, 2014 17:22
-
-
Save davidbauer/11055010 to your computer and use it in GitHub Desktop.
Python script to download images from a CSV of image urls
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# assuming a csv file with a name in column 0 and the image url in column 1 | |
import urllib | |
filename = "images" | |
# open file to read | |
with open("{0}.csv".format(filename), 'r') as csvfile: | |
# iterate on all lines | |
i = 0 | |
for line in csvfile: | |
splitted_line = line.split(',') | |
# check if we have an image URL | |
if splitted_line[1] != '' and splitted_line[1] != "\n": | |
urllib.urlretrieve(splitted_line[1], "img_" + str(i) + ".png") | |
print "Image saved for {0}".format(splitted_line[0]) | |
i += 1 | |
else: | |
print "No result for {0}".format(splitted_line[0]) |
how do I download these images in a subfolder in the main directory?
how do I download these images in a subfolder in the main directory?
you can create a subfolder and add the directory before the name of file, like this :
if splitted_line[1] != '' and splitted_line[1] != "\n":
urllib.urlretrieve(splitted_line[1], "download/" + splitted_line[0] + ".jpg")
This version has been updated to work with Python3, includes a subfolder of "images" where the files are saved, and uses a User-Agent to help avoid Forbidden Errors. Works off a file name of images.csv
#!/usr/bin/env python
# assuming a csv file with a name in column 0 and the image url in column 1
import urllib
import ntpath
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)
def path_leaf(path):
head, tail = ntpath.split(path)
return tail or ntpath.basename(head)
filename = "images"
# open file to read
with open("{0}.csv".format(filename), 'r') as csvfile:
# iterate on all lines
i = 0
for line in csvfile:
splitted_line = line.split(',')
img_filename = path_leaf(splitted_line[1])
# check if we have an image URL
if splitted_line[1] != '' and splitted_line[1] != "\n":
urllib.request.urlretrieve(splitted_line[1], "images/" + '{0}'.format(img_filename.rstrip("\r\n")))
print("Image saved for {0}".format(splitted_line[0]))
print("Filename: " + img_filename)
i += 1
else:
print("No result for {0}".format(splitted_line[0]))
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This doesnt work i just get an error. Why isn't there something I can just double click, open the csv file and just have it poop out images?