Skip to content

Instantly share code, notes, and snippets.

@xor-gate
Forked from tmslnz/HTTrack.md
Created October 16, 2024 14:57
Show Gist options
  • Save xor-gate/a03b66869656244a2fd24af34efcdf05 to your computer and use it in GitHub Desktop.
Save xor-gate/a03b66869656244a2fd24af34efcdf05 to your computer and use it in GitHub Desktop.
Nice command line for HTTrack

Commands

httrack example.com -O ./example.com -N100 −%i0 -I0 --max-rate 0 --disable-security-limits --near -v
httrack example.com -O ./example.com-3 -N100 -I0 -N "%p/%n%[month].%t" --max-rate 0 --disable-security-limits --near  -v
# Used for WA fetch of toogood (noted on 2017.02.22)
www.xxx.com -O ./xxx.com -N100 −%i0 -I0 -A0 -%! -n -v

Options

-N100    Don't put the site in its own domain directory, otherwise mirror as usual
-I0    Don't make the HTTrack index page
-N "%p/%n%[month].%t"    Name files like path/name.html or path/namemonthname.html [month] could be [page], [search], whatever the query string offers. 
--near    Fetch _near_ external resources (scripts, css, etc.)

.htaccess

Use in conjunction with these .htaccess directives:

Options -Indexes

DirectoryIndex index.html index-2.html index-3.html index-4.html index-5.html index-6.html index-7.html index-8.html

RewriteEngine On
RewriteBase /

# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

# make sure index is index
# also account for HTTrack index-n.html renaming
RewriteRule ^index(-[0-9])?\.html$ / [R=301,L]
RewriteRule ^(.*)/index(-[0-9])?\.html$ /$1 [R=301,L]

# Disable Automatic Directory detection
DirectorySlash Off

# Hide extension
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

# Redirect .html to non-.html
RewriteCond %{THE_REQUEST} \.html
RewriteRule ^(.*)\.html$ /$1 [R=301,L]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment