shelu16/crawling.md

Forked from pikpikcu/crawling.md

Created August 28, 2020 08:52

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/shelu16/2a5861bf182c28ce4510123ddfd94141.js"></script>
Save shelu16/2a5861bf182c28ce4510123ddfd94141 to your computer and use it in GitHub Desktop.

Raw

    debian@pikpikcu~$ cat subdo.txt | hakrawler | grep 'http' | cut -d '' -f 2 > crawler.txt 
debian@pikpikcu~$ gau -subs domain.com >>  crawler.txt
debian@pikpikcu~$ waybackurls domain.com >> crawler.txt 
debian@pikpikcu~$ cat crawling.txt | grep "?" | unfurl --unique format %s://%d%p > base.txt
debian@pikpikcu~$ cat base.txt | parallel -j50 -q grep {} -m5 crawling.txt | tee -a final.txt
debian@pikpikcu~$ cat final.txt | egrep -iv ".(jpg|jpeg|gif|css|tif|tiff|woff|woff2|ico|pdf|svg|txt|js)" > final_bos.txt 
debian@pikpikcu~$ rm -rf base.txt final.txt

  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment