Skip to content

Instantly share code, notes, and snippets.

@aromanyuk
Last active June 27, 2016 15:26
Show Gist options
  • Save aromanyuk/ebb1d836c61750196b94 to your computer and use it in GitHub Desktop.
Save aromanyuk/ebb1d836c61750196b94 to your computer and use it in GitHub Desktop.
wget for recursive site fetch
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org
--mirror – Makes (among other things) the download recursive.
--convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.
--adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type.
--page-requisites – Download things like CSS style-sheets and images required to properly display the page offline.
--no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.
-e robots=off - Ignore robots.txt rules
wget -mkEpnp -e robots=off http://example.org
--exclude-directories=/forums/
wget --user=username --password=password http://example.com #for basic authentication
for file in *.html; do
iconv -f windows-1251 -t utf-8 "$file" -o "${file%}"
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment