Last active
September 28, 2017 23:42
-
-
Save SamusAranX/78ff97806d9b1a37d1da832f9099e4c2 to your computer and use it in GitHub Desktop.
A little bash snippet to download Apache directory listings while circumventing robots.txt rules and simple user-agent blocks. This script also uses Python 3 to adjust the --cut-dirs value to avoid folder clutter.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function apachelisting() { | |
CUT_DIRS=$(python3 -c "from urllib.parse import urlparse; import sys; print(len([d for d in urlparse(sys.argv[1]).path.split('/') if d]))" "$1") | |
wget -r --no-parent --reject "index.html*" -e robots=off --restrict-file-names=nocontrol --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" -nH --cut-dirs="$CUT_DIRS" "$1" | |
} | |
# usage: apachelisting http://someserver.tld/ftp/a/bunch/of/folders/ | |
# this will recursively download everything from that directory listing and cut unnecessary folders away | |
# for example, http://someserver.tld/ftp/a/bunch/of/folders/stuff.zip will become ./stuff.zip on your machine | |
# likewise, http://someserver.tld/ftp/a/bunch/of/folders/evenmorefolders/junk.rar will become ./junk.rar |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment