Skip to content

Instantly share code, notes, and snippets.

@sainf
Last active February 6, 2025 18:24
Show Gist options
  • Save sainf/213cce6358009260ce906ea9652e092d to your computer and use it in GitHub Desktop.
Save sainf/213cce6358009260ce906ea9652e092d to your computer and use it in GitHub Desktop.
# Download the site (exclude fonts/icons) from WaybackMachine! https://web.archive.org/
# Replace <TIMESPAMP> and example.com
wget --mirror \
--convert-links \
--adjust-extension \
--page-requisites \
--no-parent \
--cut-dirs=3 \
--restrict-file-names=unix \
-e robots=off \
--user-agent="Mozilla/5.0" \
--reject-regex '/web/[0-9]+/https?://(fonts\.googleapis\.com|use\.fontawesome\.com)/' \
"https://web.archive.org/web/<TIMESTAMP>/http://example.com/"
# Clean primary domain URLs
# Replace and example.com
find example.com/ -type f \( -name "*.html" -o -name "*.css" -o -name "*.js" \) \
-exec sed -i -E 's#/web/[0-9]{14}/https?://(www\.)?example\.com/#/#g' {} \;
# Convert external domains to live URLs (fonts/icons)
# Replace and example.com
# Add / replace the urls from external websites
find example.com/ -type f \( -name "*.html" -o -name "*.css" -o -name "*.js" \) \
-exec sed -i -E 's#/web/[0-9]{14}/https?://(fonts\.googleapis\.com|use\.fontawesome\.com)/#https://\1/#g' {} \;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment