Skip to content

Instantly share code, notes, and snippets.

@UnforeseenOcean
Forked from protrolium/wget.md
Created September 1, 2019 06:39
Show Gist options
  • Save UnforeseenOcean/260efe9d93f9f89238d8ffba919d2904 to your computer and use it in GitHub Desktop.
Save UnforeseenOcean/260efe9d93f9f89238d8ffba919d2904 to your computer and use it in GitHub Desktop.
wget commands

Download Only Certain File Types Using wget -r -A

You can use this under following situations:

  • Download all images from a website
  • Download all videos from a website
  • Download all PDF files from a website

$ wget -r -A.pdf http://url-to-webpage-with-pdfs/

Download a Full Website Using wget –mirror

Following is the command line which you want to execute when you want to download a full website and made available for local viewing.

$ wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL

  • -mirror : turn on options suitable for mirroring.
  • -p : download all files that are necessary to properly display a given HTML page.
  • -convert-links : after the download, convert the links in document for local viewing.
  • -P ./LOCAL-DIR : save all the files and directories to the specified directory.

Download Multiple Files / URLs Using Wget -i

First, store all the download files or URLs in a text file as:

$ cat > download-file-list.txt
URL1
URL2
URL3
URL4

Next, give the download-file-list.txt as argument to wget using -i option as shown below. $ wget -i download-file-list.txt


This downloaded the entire website for me: wget --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://site/path/


I was trying to download zip files linked from Omeka's themes page - pretty similar task. This worked for me:
wget -A zip -r -l 1 -nd http://omeka.org/add-ons/themes/

  • -A: only accept zip files
  • -r: recurse
  • -l 1: one level deep (ie, only files directly linked from this page)
  • -nd: don't create a directory structure, just download all the files into this directory.

All the answers with -k, -K, -E etc options probably haven't really understood the question, as those as for rewriting HTML pages to make a local structure, renaming .php files and so on. Not relevant.

To literally get all files except .html etc:
wget -R html,htm,php,asp,jsp,js,py,css -r -l 1 -nd http://yoursite.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment