Skip to content

Instantly share code, notes, and snippets.

@cicorias
Forked from azhawkes/spider.sh
Created November 23, 2020 12:51
Show Gist options
  • Save cicorias/65a3c8c13fb7618ef5d94b842d1c39ef to your computer and use it in GitHub Desktop.
Save cicorias/65a3c8c13fb7618ef5d94b842d1c39ef to your computer and use it in GitHub Desktop.
Really simple wget spider to obtain a list of URLs on a website, by crawling n levels deep from a starting page.
#!/bin/bash
HOME="http://www.yourdomain.com/some/page"
DOMAINS="yourdomain.com"
DEPTH=2
OUTPUT="./urls.csv"
wget -r --spider --delete-after --force-html -D "$DOMAINS" -l $DEPTH "$HOME" 2>&1 \
| grep '^--' | awk '{ print $3 }' | grep -v '\. \(css\|js\|png\|gif\|jpg\)$' | sort | uniq > $OUTPUT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment