Created
January 24, 2014 16:43
-
-
Save phillipsm/8601065 to your computer and use it in GitHub Desktop.
wget command
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Construct wget command | |
command = 'wget ' | |
command = command + '--quiet ' # turn off wget's output | |
command = command + '--tries=' + str(settings.NUMBER_RETRIES) + ' ' # number of retries (assuming no 404 or the like) | |
command = command + '--wait=' + str(settings.WAIT_BETWEEN_TRIES) + ' ' # number of seconds between requests (lighten the load on a page that has a lot of assets) | |
command = command + '--quota=' + settings.ARCHIVE_QUOTA + ' ' # only store this amount | |
command = command + '--random-wait ' # random wait between .5 seconds and --wait= | |
command = command + '--limit-rate=' + settings.ARCHIVE_LIMIT_RATE + ' ' # we'll be performing multiple archives at once. let's not download too much in one stream | |
command = command + '--adjust-extension ' # if a page is served up at .asp, adjust to .html. (this is the new --html-extension flag) | |
command = command + '--span-hosts ' # sometimes things like images are hosted at a CDN. let's span-hosts to get those | |
command = command + '--convert-links ' # rewrite links in downloaded source so they can be viewed in our local version | |
command = command + '-e robots=off ' # we're not crawling, just viewing the page exactly as you would in a web-browser. | |
command = command + '--page-requisites ' # get the things required to render the page later. things like images. | |
command = command + '--no-directories ' # when downloading, flatten the source. we don't need a bunch of dirs. | |
command = command + '--no-check-certificate ' # We don't care too much about busted certs | |
command = command + '--user-agent="' + user_agent + '" ' # pass through our user's user agent | |
command = command + '--directory-prefix=' + directory + ' ' # store our downloaded source in this directory |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment