$ wget -e robots=off -r -np 'http://example.com/folder/'
- -e robots=off causes it to ignore robots.txt for that domain
- -r makes it recursive
- -np = no parents, so it doesn't follow links up to the parent folder
Thanks for sharing.
I'm still getting no-follow attribute found in $URL. Will not follow any links on this page
after using wget -e robots=off -r -np --page-requisites --convert-links $SITE
. Is this a bug?
I'm still getting
no-follow attribute found in $URL. Will not follow any links on this page
after usingwget -e robots=off -r -np --page-requisites --convert-links $SITE
. Is this a bug?
Yes, this is a bug, it should be fixed in the next version of wget: https://git.savannah.gnu.org/cgit/wget.git/commit/?id=f1cccd2c454fb416e75a22b358b0a11266642007
See https://www.reddit.com/r/DataHoarder/comments/mprq89/wget_respects_nofollow_attribute_despite_e/guct2s5/ for more details
not fixed
what is the recursive thing?
Thanks for sharing <3
wget -e robots=off -r -np --page-requisites --convert-links
For websites