Skip to content

Instantly share code, notes, and snippets.

@davidbauer
Created April 18, 2014 17:22
Show Gist options
  • Save davidbauer/11055010 to your computer and use it in GitHub Desktop.
Save davidbauer/11055010 to your computer and use it in GitHub Desktop.
Python script to download images from a CSV of image urls
#!/usr/bin/env python
# assuming a csv file with a name in column 0 and the image url in column 1
import urllib
filename = "images"
# open file to read
with open("{0}.csv".format(filename), 'r') as csvfile:
# iterate on all lines
i = 0
for line in csvfile:
splitted_line = line.split(',')
# check if we have an image URL
if splitted_line[1] != '' and splitted_line[1] != "\n":
urllib.urlretrieve(splitted_line[1], "img_" + str(i) + ".png")
print "Image saved for {0}".format(splitted_line[0])
i += 1
else:
print "No result for {0}".format(splitted_line[0])
@garvitsharma
Copy link

Thanks for this, very useful

Changed line 17 urllib.urlretrieve(splitted_line[1], splitted_line[0] + ".png") As I had the name in column 0 but your code was producing file names of img_0.png as your code was referencing str(i)

@IvanKuteynikov
Copy link

Thanks, very useful.
change line 17, now filenames will have 3 digits numbers, 000-009,010-100 and etc, it's important for arrange by names

@devashishpatil56
Copy link

Thank you for the help.

@KastanDay
Copy link

Quite useful 👍

@noe2019
Copy link

noe2019 commented Dec 22, 2018

Thanks David!

This was very useful. Just need to update for python3x users
import urllib.request in line 5 and urllib.request.retieve(splitted_line[1], "img_" + str(i) + ".png")) in line 17.

@esra-mes
Copy link

Thank you so much, David!

I made the following updates for Python3 (3.8) as mentioned by noe2019 above.

Line 17 update from noe2019 had a typo and it should be urllib.request.urlretrieve(splitted_line[1], splitted_line[0] + ".jpg")
Line 18 and 21 need to change print to python3 as well. so, print(x)

I got the following error: urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1123)>

I found a way to fix it through stackoverflow, which explained and solved the error as follows
"If you have installed Python 3.6 on OSX and are getting the "SSL: CERTIFICATE_VERIFY_FAILED" error when trying to connect to an https:// site, it's probably because Python 3.6 on OSX has no certificates at all, and can't validate any SSL connections. This is a change for 3.6 on OSX, and requires a post-install step, which installs the certifi package of certificates. This is documented in the ReadMe, which you should find at /Applications/Python\ 3.8/ReadMe.rtf

The ReadMe will have you run this post-install script, which just installs
certifi: /Applications/Python\ 3.8/Install\ Certificates.command "

@polkunus
Copy link

This doesnt work i just get an error. Why isn't there something I can just double click, open the csv file and just have it poop out images?

@bhavika-28
Copy link

how do I download these images in a subfolder in the main directory?

@francoispeyret
Copy link

how do I download these images in a subfolder in the main directory?

you can create a subfolder and add the directory before the name of file, like this :

        if splitted_line[1] != '' and splitted_line[1] != "\n":
            urllib.urlretrieve(splitted_line[1], "download/" + splitted_line[0] + ".jpg")

@scottblair
Copy link

scottblair commented Aug 30, 2022

This version has been updated to work with Python3, includes a subfolder of "images" where the files are saved, and uses a User-Agent to help avoid Forbidden Errors. Works off a file name of images.csv

#!/usr/bin/env python

# assuming a csv file with a name in column 0 and the image url in column 1

import urllib
import ntpath
import urllib.request

opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)

def path_leaf(path):
    head, tail = ntpath.split(path)
    return tail or ntpath.basename(head)

filename = "images"

# open file to read
with open("{0}.csv".format(filename), 'r') as csvfile:
    # iterate on all lines
    i = 0
    for line in csvfile:
        splitted_line = line.split(',')
        img_filename = path_leaf(splitted_line[1])
        # check if we have an image URL
        if splitted_line[1] != '' and splitted_line[1] != "\n":
            urllib.request.urlretrieve(splitted_line[1], "images/" + '{0}'.format(img_filename.rstrip("\r\n")))
            print("Image saved for {0}".format(splitted_line[0]))
            print("Filename: " + img_filename)
            i += 1
        else:
            print("No result for {0}".format(splitted_line[0]))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment