$ wget -e robots=off -r -np 'http://example.com/folder/'
- -e robots=off causes it to ignore robots.txt for that domain
- -r makes it recursive
- -np = no parents, so it doesn't follow links up to the parent folder
-r -l 0 removes the maximum depth limit.
wget -e robots=off -r -np --page-requisites --convert-links
For websites
Thanks for sharing.
I'm still getting no-follow attribute found in $URL. Will not follow any links on this page
after using wget -e robots=off -r -np --page-requisites --convert-links $SITE
. Is this a bug?
I'm still getting
no-follow attribute found in $URL. Will not follow any links on this page
after usingwget -e robots=off -r -np --page-requisites --convert-links $SITE
. Is this a bug?
Yes, this is a bug, it should be fixed in the next version of wget: https://git.savannah.gnu.org/cgit/wget.git/commit/?id=f1cccd2c454fb416e75a22b358b0a11266642007
See https://www.reddit.com/r/DataHoarder/comments/mprq89/wget_respects_nofollow_attribute_despite_e/guct2s5/ for more details
not fixed
what is the recursive thing?
Thanks for sharing <3
Remember that the
-r
option has a default maximum depth of 5. I think--mirror
is, overall, a better choice.