Last active
November 10, 2017 08:31
-
-
Save onlyforbopi/5b854a43d5aa48098a91c3707d5f29e4 to your computer and use it in GitHub Desktop.
1. HTTP AND RESTFUL BASIC THEORY
2. BASIC MODULES - REQUEST
3. HTTPRESTFULCOMMANDS.py (COMMAND OVERVIEW)
4. DOWNLOAD FILES WITH HTTP/RESTFUL USING PYTHON - pythonhttprestdl.py
5. DOWNLOADING COMPARISON AND OVERVIEW OF METHODS- dlmethodstest.py 6. SIMPLE HTTP HANDSHAKES AND URLS
7. Base64 authentification
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import requests, base64 | |
| usrPass = "userid:password" | |
| b64Val = base64.b64encode(usrPass) | |
| r=requests.post(api_URL, | |
| headers={"Authorization": "Basic %s" % b64Val}, | |
| data=payload) | |
| >>> import base64 | |
| >>> encoded = base64.b64encode('data to be encoded') | |
| >>> encoded | |
| 'ZGF0YSB0byBiZSBlbmNvZGVk' | |
| >>> data = base64.b64decode(encoded) | |
| >>> data | |
| 'data to be encoded' | |
| import base64 | |
| base64.b64encode(b'your name') # b'eW91ciBuYW1l' | |
| base64.b64encode('your name'.encode('ascii')) # b'eW91ciBuYW1l' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Request Library | |
| The Requests python library is simple and straight forward library for developing RESTful Clients. | |
| Python has a built in library called urllib2, it is bit complex and old style when compared to Requests. | |
| After writing couple of programs using the urllib2, I am completely convinced by the below statement issued by the | |
| developers of Requests. Also refer the Reference[4] for comparing the code segments written using | |
| urllib2 and requests library. | |
| Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly | |
| broken. It was built for a different time — and a different web. It requires an enormous amount of work | |
| (even method overrides) to perform the simplest of tasks. | |
| Please refer the URL http://docs.python-requests.org/en/latest/user/install/#install to install the | |
| requests library before proceeding. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/python | |
| import requests | |
| from StringIO import StringIO | |
| from PIL import Image | |
| import profile as profile | |
| import urllib | |
| import wget | |
| url = 'https://tinypng.com/images/social/website.jpg' | |
| def testRequest(): | |
| image_name = 'test1.jpg' | |
| r = requests.get(url, stream=True) | |
| with open(image_name, 'wb') as f: | |
| for chunk in r.iter_content(): | |
| f.write(chunk) | |
| def testRequest2(): | |
| image_name = 'test2.jpg' | |
| r = requests.get(url) | |
| i = Image.open(StringIO(r.content)) | |
| i.save(image_name) | |
| def testUrllib(): | |
| image_name = 'test3.jpg' | |
| testfile = urllib.URLopener() | |
| testfile.retrieve(url, image_name) | |
| def testwget(): | |
| image_name = 'test4.jpg' | |
| wget.download(url, image_name) | |
| if __name__ == '__main__': | |
| profile.run('testRequest()') | |
| profile.run('testRequest2()') | |
| profile.run('testUrllib()') | |
| profile.run('testwget()') | |
| #testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds | |
| #testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds | |
| #testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds | |
| #testwget - 3489 function calls in 0.020 seconds |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import urllib2 | |
| username = 'user1' | |
| password = '123456' | |
| #This should be the base url you wanted to access. | |
| baseurl = 'http://server_name.com' | |
| #Create a password manager | |
| manager = urllib2.HTTPPasswordMgrWithDefaultRealm() | |
| manager.add_password(None, baseurl, username, password) | |
| #Create an authentication handler using the password manager | |
| auth = urllib2.HTTPBasicAuthHandler(manager) | |
| #Create an opener that will replace the default urlopen method on further calls | |
| opener = urllib2.build_opener(auth) | |
| urllib2.install_opener(opener) | |
| #Here you should access the full url you wanted to open | |
| response = urllib2.urlopen(baseurl + "/file") | |
| ######################################################### | |
| ######################################################### | |
| # py3 | |
| down vote | |
| If you can use the requests library, it's insanely easy. I'd highly recommend using it if possible: | |
| import requests | |
| url = 'http://somewebsite.org' | |
| user, password = 'bob', 'I love cats' | |
| resp = requests.get(url, auth=(user, password)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| BASIC THEORY | |
| HTTP & RESTful APIs | |
| HTTP is a request / response protocol and is similar to client-server model. | |
| In the internet world, generally the web browser sends the HTTP request and | |
| the web server responds with HTTP response. Also it is not necessary that | |
| the client is always a browser. The client can be any application which can | |
| send a HTTP request. | |
| We have used so many application level communication protocols. Starting from | |
| RPC (Remote Procedure Call), Java RMI (Remote Method Invocation), XML/RPC, | |
| SOAP/HTTP. In this lineage RESTful API is the current application level | |
| client-server protocol. | |
| RESTful API is an application level protocol. It is heavily used in internet | |
| (WWW) and distributed systems. It is recommended by Services Oriented Architecture | |
| (SOA) to communicate between loosely coupled distributed components. | |
| The RESTful API is a form of HTTP protocol is the de facto standard for Cloud | |
| communications. | |
| The two properties of RESTful which makes suitable for modern internet and | |
| cloud communication is stateless and cache-less. The protocol does not enforce | |
| any state-machine, it means there is no order of protocol messages enforced. | |
| Also the protocol will not remember any information across requests or responses. | |
| Each and every request is unique and it has no relation with previous or next request | |
| which may come. To understand more on HTTP protocol look at the references below. | |
| Basic Python 3 Library = import requests |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ################################################################################## | |
| #The Structure of HTTP / RESTful API | |
| #Following are points to remember while developing RESTful API: | |
| # | |
| #URL ( Universal Resource Locator ) | |
| #Message Type | |
| #Headers | |
| #Parameters | |
| #Payload | |
| #Authentication | |
| ################################################################################# | |
| #1. URL | |
| #The URL is the core of RESTful API. Generally the URL refers a web page, but it can also refer a service or a resource. | |
| #For example : http://graph.facebook.com/v2.3/{photo-id} | |
| #The above URL is a resource which holds the photo with id photo-id. As per the above syntax the value for the photo-id must be replaced with {photo-id}. | |
| #Python code snippet to store a URL in a Python object: | |
| >>> url = 'http://graph.facebook.com/v2.3/123435' | |
| ################################################################################# | |
| #2. Message Types | |
| #HTTP supports | |
| # GET | |
| # POST | |
| # PUT | |
| # DELETE message types. | |
| #There are few more types as well. Please take a look at the reference[1] to understand them in detail. | |
| ######### | |
| #GET – to retrieve resource. | |
| #Eg. GET http://graph.facebook.com/v2.3/1234345 will retrieve the photograph stored in that location. | |
| >>> import requests | |
| >>> ret = requests.get(url) | |
| >>> ret.staus_code | |
| 200 | |
| ######## | |
| #POST – to update a resource . | |
| #POST http://graph.facebook.com/v2.3/123435 will update the existing photo with the new | |
| #photograph supplied in the message payload. POST will also create resource, if the resource | |
| #is not available. | |
| >>> import requests | |
| >>> ret = requests.post(url) | |
| >>> ret.status_code | |
| 200 | |
| ######## | |
| #PUT – to create a resource. PUT http://graph.facebook.com/v2.3/123435 will create a resource | |
| #by uploading the photograph sent on the message payload. | |
| >>> import requests | |
| >>> ret = requests.put(url) | |
| >>> ret.status_code | |
| 201 | |
| ######## | |
| #DELETE – to delete a resource – DELETE http://graph.facebook.com/v2.3/123435 will delete the | |
| #photograph present in that location. | |
| >>> import requests | |
| >>> ret = requests.delete(url) | |
| >>> ret.status_code | |
| 200 | |
| ################################################################# | |
| #3. Headers | |
| #The HTTP header generally contains information used to process the request and responses. | |
| #The headers are colon separated key value pairs. For example “Accept: text/plain”. | |
| #The http request & response may be have multiple headers. Since it is a key value pair, | |
| #we can use Python’s dictionary data type to store these values. | |
| #Single Header & Multiple headers: | |
| >>> head = {"Content-type": "application/json"} | |
| >>> head= {"Accept":"applicaiton/json", | |
| "Content-type": "application/json"} | |
| #Make the API call with the above header: | |
| >>> ret = requests.get(url,headers=head) | |
| >>> ret.status_code | |
| 200 | |
| #In the above statement, “headers” is the name of argument. So we have used the Python | |
| #feature of passing named arguments to a function. | |
| ############################################################################## | |
| #4 Parameters | |
| #Sometimes we may want to pass values in the URL parameters. For example, the URL | |
| #http://www.abc.com/abc.php?name=Saravanan&designation=Technical Leader . This | |
| #URL expects the user to send the value for the keyword “name” and “designation”. | |
| #The below code snippet helps to you accomplish this tasks. The “params” argument is used | |
| #to set the value for parameters. | |
| >>> parameters = {'name':'Saravanan', | |
| 'designation':'Technical Leader'} | |
| >>> head = {'Content-Type':'application/json'} | |
| >>> ret = requests.post(url,params=parameters,header=head) | |
| >>> ret.status_code | |
| 200 | |
| ################################################################################# | |
| #5 Payload | |
| #The payload contains the data to be sent on the requests. In this we will see how to send a JSON | |
| #object in the payload. | |
| empObj = {'name':'Saravanan', 'title':'Architect','Org':'Cisco Systems'} | |
| #As in the previous examples, we cannot send the JSON object which is a dictionary data type in | |
| #Python. In the above snippet we created a empObj which is a dictionary data type of Python. | |
| #This must be converted into JSON object before send the request. | |
| #The json library in Python helps here . | |
| >>> import json | |
| >>> emp = json.dumps(empObj) | |
| #The json.dumps converts the dictionary object into a JSON object. | |
| #The complete code snippet is below: | |
| >>> import json | |
| >>> import requests | |
| >>> | |
| >>> url='http://graph.facebook.com/v2.3/123123 | |
| >>> head = {'Content-type':'application/json', | |
| 'Accept':'application/json'} | |
| >>> payload = {'name':'Saravanan', | |
| 'Designation':'Architect', | |
| 'Orgnization':'Cisco Systems'} | |
| >>> payld = json.dumps(payload) | |
| >>> ret = requests.post(url,header=head,data=payld) | |
| >>> ret.status_code | |
| 200 | |
| ########################################################################################## | |
| #6 Authorization | |
| #The “requests” library supports various forms of authentication, which includes Basic, | |
| #Digest Authentication, OAuth and others. The value for authentication can be passed using | |
| #“auth” parameter of the requests method. | |
| >>> | |
| >>> from requests.auth import HTTPBasicAuth | |
| >>> url = 'http://www.hostmachine.com/sem/getInstances' | |
| >>> requests.get(url, auth=HTTPBasicAuth('username','password') | |
| 200 | |
| #The “auth” argument can take any function, so if you want to define your own custom authentication | |
| #and pass it to “auth“. | |
| #Summary | |
| #The above code snippet is a sample to explain the simplicity of Python and requests library. | |
| #You can take a look at the official website of Requests and learn advanced concepts | |
| #in RESTful API developments. | |
| ##################################################################################### | |
| #References | |
| #[1] HTTP Wiki : http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol | |
| #[2] History of HTTP by W3 Org : http://www.w3.org/Protocols/History.html | |
| #[3] Requests – http://docs.python-requests.org/en/latest/ | |
| #[4] Requests and Urllib2 Comparison : https://gist.github.com/kennethreitz/973705 | |
| #[5] Installation of Requests library : http://docs.python-requests.org/en/latest/user/install/#install | |
| #[6] HTTP Headers – http://en.wikipedia.org/wiki/List_of_HTTP_header_fields | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ################################################################################################# | |
| ################################################################################################# | |
| # Following are the most commonly used calls for downloading files in python: | |
| # It depends on version, (2 vs 3) or whether we use standard library (urllib) or | |
| # external modules (wget, requests) | |
| # Using urllib | |
| urllib.urlretrieve ('url_to_file', file_name) | |
| # Using urllib2 | |
| urllib2.urlopen('url_to_file') | |
| # Using requests library | |
| requests.get(url) | |
| # Using wget library | |
| wget.download('url', file_name) | |
| #Note: urlopen and urlretrieve are found to perform relatively bad with downloading large files | |
| #(size > 500 MB). requests.get stores the file in-memory until download is complete. | |
| #Note2: Most of these methods will store the contents in a variable, and then we ll need to iterate | |
| # over it and write it to a file, in text mode, or binary mode. See below. | |
| ################################################################################################# | |
| ################################################################################################# | |
| # Using urllib2 | |
| # Basics | |
| #In Python 2, use urllib2 which comes with the standard library. | |
| # Ex 1. | |
| # import module | |
| import urllib2 | |
| # store url response in variable response | |
| response = urllib2.urlopen('http://www.example.com/') | |
| # call method .read() on response and store in html | |
| html = response.read() | |
| # Ex 2 - + Saving to File | |
| #The wb in open('test.mp3','wb') opens a file (and erases any existing file) | |
| #in binary mode so you can save data with it instead of just text. | |
| import urllib2 | |
| mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3") | |
| # Writing to file | |
| with open('test.mp3','wb') as output: | |
| output.write(mp3file.read()) | |
| # Ex 3 - Saving to file, Parsing Filename, Reading by buffer | |
| import urllib2 | |
| import os | |
| url = "http://download.thinkbroadband.com/10MB.zip" | |
| file_name = url.split('/')[-1] # parse filename | |
| u = urllib2.urlopen(url) # open url, store in u | |
| f = open(file_name, 'wb') # open filename, for writing | |
| meta = u.info() # print meta information, using info() | |
| file_size = int(meta.getheaders("Content-Length")[0]) # get the size of the file | |
| print "Downloading: %s Bytes: %s" % (file_name, file_size) | |
| os.system('cls') | |
| file_size_dl = 0 # set starting file size | |
| block_sz = 8192 # set starting block size | |
| while True: | |
| buffer = u.read(block_sz) # read the first block | |
| if not buffer: | |
| break | |
| file_size_dl += len(buffer) # update file size | |
| f.write(buffer) # write block into output file | |
| status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size) | |
| status = status + chr(8)*(len(status)+1) | |
| print status, | |
| f.close() | |
| # Ex. 4 - with functions for download | |
| import os | |
| from urllib2 import urlopen, URLError, HTTPError | |
| def dlfile(url): | |
| # Open the url | |
| try: | |
| f = urlopen(url) | |
| print "downloading " + url | |
| # Open our local file for writing | |
| with open(os.path.basename(url), "wb") as local_file: | |
| local_file.write(f.read()) | |
| #handle errors | |
| except HTTPError, e: | |
| print "HTTP Error:", e.code, url | |
| except URLError, e: | |
| print "URL Error:", e.reason, url | |
| def main(): | |
| # Iterate over image ranges | |
| for index in range(150, 151): | |
| url = ("http://www.archive.org/download/" | |
| "Cory_Doctorow_Podcast_%d/" | |
| "Cory_Doctorow_Podcast_%d_64kb_mp3.zip" % | |
| (index, index)) | |
| dlfile(url) | |
| if __name__ == '__main__': | |
| main() | |
| # Ex 5 - with progress bar | |
| import urllib2 | |
| url = "http://download.thinkbroadband.com/10MB.zip" | |
| file_name = url.split('/')[-1] | |
| u = urllib2.urlopen(url) | |
| f = open(file_name, 'wb') | |
| meta = u.info() | |
| file_size = int(meta.getheaders("Content-Length")[0]) | |
| print "Downloading: %s Bytes: %s" % (file_name, file_size) | |
| file_size_dl = 0 | |
| block_sz = 8192 | |
| while True: | |
| buffer = u.read(block_sz) | |
| if not buffer: | |
| break | |
| file_size_dl += len(buffer) | |
| f.write(buffer) | |
| status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size) | |
| status = status + chr(8)*(len(status)+1) | |
| print status, | |
| f.close() | |
| ################################################################################################# | |
| ################################################################################################# | |
| # Using urllib | |
| # urllib has method irlretrieve, which will save the file in first parameter, to the pathname in | |
| # second, ie : | |
| # Python 2 | |
| import urllib | |
| urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3") | |
| #(for Python 3+ use 'import urllib.request' and urllib.request.urlretrieve) | |
| import urllib.request | |
| import urllib.request.urlretrieve | |
| urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3") | |
| # Ex Basic (Using urllib - Most stable method) | |
| import urllib | |
| testfile = urllib.URLopener() | |
| testfile.retrieve("http://randomsite.com/file.gz", "file.gz") | |
| # Ex.1 (Python 3 + only standard lib) | |
| # Read the html response | |
| urllib.request.urlopen | |
| import urllib.request | |
| response = urllib.request.urlopen('http://www.example.com/') | |
| html = response.read() | |
| # Retrieve / Download file | |
| urllib.request.urlretrieve | |
| import urllib.request | |
| urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3') | |
| # Ex. 2 (Python 2 + print) | |
| import urllib | |
| sock = urllib.urlopen("http://diveintopython.org/") | |
| htmlSource = sock.read() | |
| sock.close() | |
| print htmlSource | |
| # Ex. 3 (Python 3 + output to file) | |
| #urlretrieve and requests.get is simple, however in reality not. | |
| #I have fetched data for couple sites, including text and images, | |
| #the above two probably solve most of the tasks. but for a more | |
| #universal solution I suggest the use of urlopen. As it is included | |
| #in Python 3 standard library, your code could run on any machine that | |
| #run Python 3 without pre-installing site-par | |
| # Note: | |
| #This answer provides a solution to HTTP 403 Forbidden when downloading | |
| #file over http using Python. I have tried only requests and urllib | |
| #modules, the other module may provide something better, but this is | |
| #the one I used to solve most of the problems. | |
| import urllib.request | |
| url_request = urllib.request.Request(url, headers=headers) | |
| url_connect = urllib.request.urlopen(url_request) | |
| len_content = url_content.length | |
| #remember to open file in bytes mode | |
| with open(filename, 'wb') as f: | |
| while True: | |
| buffer = url_connect.read(buffer_size) | |
| if not buffer: break | |
| #an integer value of size of written data | |
| data_wrote = f.write(buffer) | |
| #you could probably use with-open-as manner | |
| url_connect.close() | |
| # Ex. 4 (Python 2 + two ways to do read() | |
| # Note : | |
| # urllib2 is more complete than urllib and should likely be the module | |
| # used if you want to do more complex things, but to make the answers | |
| #more complete, urllib is a simpler module if you want just the basics: | |
| import urllib | |
| response = urllib.urlopen('http://www.example.com/sound.mp3') | |
| mp3 = response.read() | |
| Will work fine. Or, if you don't want to deal with the "response" object you can call read() directly: | |
| import urllib | |
| mp3 = urllib.urlopen('http://www.example.com/sound.mp3').read() | |
| # Ex. 5 (Python 3 + Using .decode method for binary) | |
| import urllib.request | |
| url = 'http://example.com/' | |
| response = urllib.request.urlopen(url) | |
| data = response.read() # a `bytes` object | |
| text = data.decode('utf-8') # a `str`; this step can't be used if data is binary | |
| # Ex. 6 (Python3 + urlretrieve()) - urlretrieve is considered deprecated | |
| # The easiest way to download and save a file is to use the urllib.request.urlretrieve function: | |
| import urllib.request | |
| # Download the file from `url` and save it locally under `file_name`: | |
| urllib.request.urlretrieve(url, file_name) | |
| # Ex. 7 (Python3 + urlretrieve + get filename) | |
| # Download the file from `url`, save it in a temporary directory and get the | |
| # path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable: | |
| file_name, headers = urllib.request.urlretrieve(url) | |
| #But keep in mind that urlretrieve is considered legacy and might become | |
| #deprecated (not sure why, though). | |
| # Ex. 8 (Python 3 - Most correct way - using urlopen()) | |
| #But this works well only for small files. | |
| import urllib.request | |
| import shutil | |
| # Download the file from `url` and save it locally under `file_name`: | |
| with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: | |
| shutil.copyfileobj(response, out_file) | |
| # Ex. 9 ( Python 3 - Using .write()) | |
| # If this seems too complicated, you may want to go simpler and store the whole #download in a bytes | |
| # object and then write it to a file. | |
| import urllib.request | |
| ... | |
| # Download the file from `url` and save it locally under `file_name`: | |
| with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: | |
| data = response.read() # a `bytes` object | |
| out_file.write(data) | |
| # Ex .10 (Combined with unzipping) | |
| #It is possible to extract .gz (and maybe other formats) compressed data on | |
| #the fly, but such an operation probably requires the HTTP server to support | |
| #random access to the file. | |
| import urllib.request | |
| import gzip | |
| ... | |
| # Read the first 64 bytes of the file inside the .gz archive located at `url` | |
| url = 'http://example.com/something.gz' | |
| with urllib.request.urlopen(url) as response: | |
| with gzip.GzipFile(fileobj=response) as uncompressed: | |
| file_header = uncompressed.read(64) # a `bytes` object | |
| # Or do anything shown above using `uncompressed` instead of `response | |
| ################################################################################################# | |
| ################################################################################################# | |
| # Using requests | |
| # Ex 1 - Using requests module to print length of file | |
| import requests | |
| url = "http://download.thinkbroadband.com/10MB.zip" | |
| r = requests.get(url) | |
| print len(r.content) | |
| #10485760 | |
| # Ex 2 - Using method .text | |
| # Note: Used in case of downloading text files | |
| # response.text return the output as a string object, | |
| #use it when you're downloading a text file. Such as HTML file, etc. | |
| import requests | |
| url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf' | |
| response = requests.get(url) | |
| with open('/tmp/metadata.pdf', 'wb') as f: | |
| f.write(response.text) | |
| # Ex 3 - Using method .content | |
| # response.content return the output as bytes object, use it when | |
| #you're downloading a binary file. Such as PDF file, audio file, image, etc. | |
| # In case of downloading non text files, here | |
| # response.txt returns a byte string | |
| import requests | |
| url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf' | |
| response = requests.get(url) | |
| with open('/tmp/metadata.pdf', 'wb') as f: | |
| f.write(response.content) | |
| # Ex 4 - Using method response.raw | |
| #chunk_size is the chunk size which you want to use. If you set it as 2000, | |
| #then requests will download that file the first 2000 bytes, write them into | |
| #the file, and do this again, again and again, unless it finished. | |
| #So this can save your RAM. But I'd prefer use response.content instead in | |
| #this case since your file is small. As you can see use response.raw is complex. | |
| import requests | |
| url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf' | |
| r = requests.get(url, stream=True) | |
| with open('/tmp/metadata.pdf', 'wb') as fd: | |
| for chunk in r.iter_content(chunk_size): | |
| fd.write(chunk) | |
| # Ex 5 - Using requests module + implemented progressbar + tqdm | |
| from tqdm import tqdm | |
| import requests | |
| url = "http://download.thinkbroadband.com/10MB.zip" | |
| response = requests.get(url, stream=True) | |
| with open("10MB", "wb") as handle: | |
| for data in tqdm(response.iter_content()): | |
| handle.write(data) | |
| ################################################################################################# | |
| ################################################################################################# | |
| # Using wget | |
| import wget | |
| wget.download('url') | |
| #If you have wget installed, you can use parallel_sync. | |
| #pip install parallel_sync | |
| from parallel_sync import wget | |
| urls = ['http://something.png', 'http://somthing.tar.gz', 'http://somthing.zip'] | |
| wget.download('/tmp', urls) | |
| # or a single file: | |
| wget.download('/tmp', urls[0], filenames='x.zip', extract=True) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| You could use a library called requests. | |
| import requests | |
| r = requests.get("http://example.com/foo/bar") | |
| This is quite easy. Then you can do like this: | |
| >>> print r.status_code | |
| >>> print r.headers | |
| >>> print r.content | |
| import httplib2 | |
| resp, content = httplib2.Http().request("http://example.com/foo/bar") | |
| # wget in python | |
| # From python cookbook, 2nd edition, page 487 | |
| import sys, urllib | |
| def reporthook(a, b, c): | |
| print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c), | |
| for url in sys.argv[1:]: | |
| i = url.rfind("/") | |
| file = url[i+1:] | |
| print url, "->", file | |
| urllib.urlretrieve(url, file, reporthook) | |
| # Python 3 version | |
| import sys, urllib.request | |
| def reporthook(a, b, c): | |
| print ("% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c)) | |
| sys.stdout.flush() | |
| for url in sys.argv[1:]: | |
| i = url.rfind("/") | |
| file = url[i+1:] | |
| print (url, "->", file) | |
| urllib.request.urlretrieve(url, file, reporthook) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment