Last active
June 27, 2016 15:26
-
-
Save aromanyuk/ebb1d836c61750196b94 to your computer and use it in GitHub Desktop.
wget for recursive site fetch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org | |
--mirror – Makes (among other things) the download recursive. | |
--convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing. | |
--adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type. | |
--page-requisites – Download things like CSS style-sheets and images required to properly display the page offline. | |
--no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site. | |
-e robots=off - Ignore robots.txt rules | |
wget -mkEpnp -e robots=off http://example.org | |
--exclude-directories=/forums/ | |
wget --user=username --password=password http://example.com #for basic authentication | |
for file in *.html; do | |
iconv -f windows-1251 -t utf-8 "$file" -o "${file%}" | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment