Created
December 13, 2015 03:32
-
-
Save rushipkar90/3739a78e487f446a5dd5 to your computer and use it in GitHub Desktop.
Bots investigation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Refer: http://www.inmotionhosting.com/support/website/server-usage/identify-and-block-bad-robots-from-website | |
How to identify bad bot for a domain | |
============ | |
cd /home/xyystgkp/access-logs | |
cat justforflorida.com | awk -F\" '{print $6}' | sort | uniq -c | sort -n | |
>> | |
36 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36 | |
71 Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) | |
95 WordPress/3.5.1; http://justforflorida.com/florida | |
613 - | |
738 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0) | |
cat justforflorida.com | awk -F\" '$6 ~ "-"' | awk '{print $1}' | sort -n | uniq -c | sort -n | |
>> | |
1 46.148.22.18 | |
2 31.187.64.239 | |
2 54.91.137.217 | |
613 23.23.233.205 | |
grep 23.23.233.205 justforflorida.com | awk -F\" '{print $6}' | sort | uniq -c | sort -n | |
whois 23.23.233.205 | |
============ | |
Block a bad robot | |
================== | |
block the entire range of 74.125 IPs we were seeing from accessing the example.com website, but still allow them to request if they do happen to mention Google in their User-Agent string of the request. | |
.htaccess file | |
------------- | |
ErrorDocument 503 "Site disabled for crawling" | |
RewriteEngine On | |
RewriteCond %{HTTP_USER_AGENT} !^.*(Google).*$ | |
RewriteCond %{REMOTE_ADDR} ^74.125 | |
RewriteRule .* - [R=503,L] | |
------------- | |
cat ~userna5/access-logs/example.com | grep "74.125" | awk '$9 ~ 503' | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":00"}' | sort -n | uniq -c | sed 's/[ ]*//' | |
================== | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment