Tracks nefarious activity on website, and manages accordingly.
If the requesting entity:
- declares its user-agent as being wget, curl, webcopier etc - it's probably a bot.
- requests details -> details -> details -> details ad nauseum - it's probably a bot.
- requests the html, but not .css, .js or site furniture - it's probably a bot.
- generates a large number of HTTP error codes > 400 (1.e 401, 403, 404 & 500)- it's probably a bot.
- originates from an unlikely human traffic source (i.e Amazon AWS) - it's probably a bot.
- no user-agent (or matching a pattern of known bad ones) - it's probably a bot.
- no cookie, and wont honor a set cookie - it's probably a bot.
- no referrer, ever - it's probably a bot.
- sessions with a lot of hits. it's probably a bot.
- requests with a missing referer. it's probably a bot.
- requests with a missing sessionID. it's probably a bot.
Probable bots will be presented with a captcha type page. Humans can confirm their cognisance, bots will be trapped.
This will work at the top of the stack using the ZTM to "manage" the offender.
One more environment to consider: the corporate network.
likely to find many dozens or hundreds of users with the exact same OS, browser, plugins, fonts etc. IP addresses are likely to be the same if the users are behind a corporate firewall.
window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs
window.__phantomas //PhantomJS-based web perf metrics + monitoring tool
window.Buffer //nodejs
window.emit //couchjs
window.spawn //rhino
window.webdriver //selenium
window.domAutomation (or window.domAutomationController) //chromium based automation driver
if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
X-Bot | X-BotBitMap | Threat |
---|---|---|
1 | 0000000000000001 | No Cookie |
2 | 0000000000000010 | No Referer |
4 | 0000000000000100 | Bad User Agent |
8 | 0000000000001000 | Unlikely Human Traffic Source (AWS, Azure, etc) |
16 | 0000000000010000 | Known "Evasively Tricky" Source Country |
X-Bot | X-BotBitMap | Threat |
---|---|---|
1 | 0000000000000001 | No Cookie |
2 | 0000000000000010 | No Referrer |
4 | 0000000000000100 | User Agent Spoof (Headers dont match User-Agent String) |
8 | 0000000000001000 | Unlikely Human Traffic Source (AWS, Azure, etc) |
16 | 0000000000010000 | Known "Evasively Tricky" Source Country |
32 | 0000000000100000 | Unlikely Human Behaviour |
64 | 0000000001000000 | Browser Integrity (Not requesting furniture) |
128 | 0000000010000000 | Session Length Exceeded |
256 | 0000000100000000 | Pages Per Session Exceeded |
512 | 0000001000000000 | User Agent Spoof (Headers dont match User-Agent String) |
1024 | 0000010000000000 | Browser Integrity (Not requesting furniture) |
2048 | 0000100000000000 | Generates lots of errors (404s) |
4096 | 0001000000000000 | No JavaScript |
8192 | 0010000000000000 | JavaScript validation Failed |
16384 | 0100000000000000 | Fingerprint Validation Error |
32768 | 1000000000000000 | Known Automation (curl, wget, Selenium/Webdriver, Phantomjs) |
- Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things
- 100-continue
- Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things
- no-cache
- Cache-Control
- Enforce RFC 2965 sec 3.3.5 (Cookie2) and 9 (HISTORICAL)
- SQL injection
- ;DECLARE%20@
- SELECT
- SLEEP
- -- (that’s two dashes)
- @@VERSION
- VARCHAR
- CHAR
- EXEC
- EXECUTE
- DECLARE
- CAST
- Range: field exists and begins with 0, real user-agents do not start ranges at 0
- Content-Range is a response header, not a request header
- Via pinappleproxy || Via PCNETSERVER || Via Invisiware
- keep-alive and close are mutually exclusive
- Close shouldn't appear twice
- Keey-Alive shouldn't appear twice either
- “Proxy-Connection” does not exist and should never be seen in the wild
- Referrer, if it exists, it must not be blank, and it must contain the absolute URL.
#!/pseudo/code
$ua = $headers['User-Agent'];
//Referrer, if it exists, must contain a :
//While a relative URL is technically valid in Referrer, all known legit user-agents send an absolute URL
if (strpos($headers['Referer'], ":") === FALSE) {
return 400, "An invalid request was received from your browser. This may be caused by a malfunctioning proxy server or browser privacy software.";
}
// Analyze user agents claiming to be msnbot
if ($ua="bingbot") || ($ua="msnbot") || ($ua="MS Search") {
CheckIp($headers['ip'], array["207.46.0.0/16", "65.52.0.0/14", "207.68.128.0/18", "207.68.192.0/20", "64.4.0.0/18", "157.54.0.0/15", "157.60.0.0/16", "157.56.0.0/14"]);
}
// Analyze user agents claiming to be google
if ($ua="Googlebot") || ($ua="Mediapartners-Google") || ($ua="Google Web Preview"){
CheckIp($headers['ip'], array["66.249.64.0/19", "64.233.160.0/19", "72.14.192.0/18", "203.208.32.0/19", "74.125.0.0/16", "216.239.32.0/19", "209.85.128.0/17"])
if ($headers['from']=="googlebot(at)googlebot.com" // google bot sends this
}
// Analyze user agents claiming to be Yahoo
if ($ua="Yahoo! Slurp") || ($ua="Yahoo! SearchMonkey") {
CheckIp($headers['ip'], array["202.160.176.0/20", "67.195.0.0/16", "203.209.252.0/24", "72.30.0.0/16", "98.136.0.0/14", "74.6.0.0/16"])
}
if ($ua~"MSIE") {
if ($ua~"Opera") {
// test Opera sent a "Accept" header.
if ($headers['Accept']) { // looks like opera
return "human"
}
} else {
// MSIE does NOT send "Windows ME" or "Windows XP" in the user agent
if ($headers['User-Agent']="Windows ME") || ($headers['User-Agent']="Windows XP") || ($headers['User-Agent'] ="Windows 2000") || ($headers['User-Agent']="Win32") {
//this MSIE is a bot
return "bot"
}
} elseif ($ua~"Konqueror") !== FALSE) {
// CafeKelsa appears to be a dev project at Yahoo which indexes job listings for
// Yahoo! HotJobs. It announces itself as Konqueror, so we skip these checks.
if (($headers['User-Agent']~"YahooSeeker/CafeKelsa") === FALSE || CheckIp($headers['ip'], "209.73.160.0/19") === FALSE) {
// if its a real browser it will send an Accept header
if ($headers['Accept']) { return "human" }}
} elseif ($ua~"Opera") !== FALSE) {
// if its a real browser it will send an Accept header
if ($headers['Accept']) { return "human" }
} elseif ($ua~"Safari") !== FALSE) {
// if its a real browser it will send an Accept header
if ($headers['Accept']) { return "human" }
} elseif ($ua~"Lynx") !== FALSE) {
// if its a real browser it will send an Accept header
if ($headers['Accept']) { return "human" }
} elseif ($ua~"Mozilla") !== FALSE && (strpos($ua, "Mozilla") == 0) {
if ($ua~"Google Desktop") === FALSE && ($ua~"PLAYSTATION 3") === FALSE) {
// if its a real browser it will send an Accept header
if ($headers['Accept']) { return "human" }
}
}
sub isBadUserAgent($ua) {
$BadUserAgents = [
"8484 Boston Project",
"; Widows",
"AddThis.com robot tech.support@clearspring.com",
"BOT/0.1 (BOT for JCE)",
"Bichoo Spider",
"BotBuster Bad Behavior Test",
"COMODOspider/Nutch-1.0",
"CherryPicker",
"ClickTale bot",
"ContextAd Bot 1.0",
"DTS Agent",
"Diamond",
"Digger",
"Domnutch-Bot/Nutch-1.0 (Domnutch; http://www.Nutch.de/) Nutch-1.0",
"Email Extractor",
"Email Siphon",
"EmailCollector",
"EmailSiphon",
"Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)",
"FreeNutch/Nutch-1.2 Nutch-1.2",
"Fve Nutch Spider/Nutch-1.7",
"GMI sentiment crawler/Nutch-1.0 (GMI sentiment crawler; http://GMI.googlepages.com ; MyEmail)",
"Gecko/25",
"GeoHasher/Nutch-1.0 (GeoHasher Web Search Engine; geohasher.gotdns.org; geo_hasher at yahoo * com)",
"Google-HTTP-Java-Client/1.17.0-rc (gzip)",
"Halebot (Mozilla/5.0 compatible; Halebot/2.1; http://www.tacitknowledge.com/halebot/)",
"HttpProxy",
"ISC Systems iRc",
"Indy Library",
"Infoaxe./Nutch-0.9",
"Infoaxe./Nutch-1.0",
"Internet Explorer",
"Jakarta Commons",
"Java 1.",
"Java/1.",
"KSCrawler/Nutch-1.0 (http://www.kindsight.net/en/kscrawler; crawler@kindsight.net)",
"LWP",
"MJ12bot/v1.0.8",
"MSIE",
"Microsoft URL Control - 6.00.8862",
"Microsoft URL",
"Missigua",
"Movable Type",
"Mozilla/2",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)/Nutch-1.0",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://www.changedetection.com/bot.html )",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0 ; Claritybot)",
"Mozilla/4.0(",
"Mozilla/4.0+(compatible;+",
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (support.voilabot@orange-ftgroup.com)",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; ) Firefox/1.5.0.11; 360Spider",
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 << seen from this ip 162.242.135.149",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36",
"Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111107 Ubuntu/10.04 (lucid) Firefox/3.6.24 Mozilla/3.5 (Google-HotelAdsVerifier)",
"Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)",
"Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)",
"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
"Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)",
"Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); +http://www.exabot.com/go/robot)",
"Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)",
"Mozilla/5.0 (compatible; Ezooms/1.0; help@moz.com)",
"Mozilla/5.0 (compatible; Genieo/1.0 http://www.genieo.com/webfilter.html)",
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://import.io)",
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
"Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Exabot-Thumbnails)",
"Mozilla/5.0 (compatible; LinkChecker/8.3; +http://wummel.github.com/linkchecker/)",
"Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)",
"Mozilla/5.0 (compatible; MJ12bot/v1.4.4; http://www.majestic12.co.uk/bot.php?+)",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Selenium Bot)",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Selenium Bot)",
"Mozilla/5.0 (compatible; MojeekBot/0.6; http://www.mojeek.com/bot.html)",
"Mozilla/5.0 (compatible; SEOkicks-Robot; +http://www.seokicks.de/robot.html)",
"Mozilla/5.0 (compatible; SemrushBot/0.97; +http://www.semrush.com/bot.html)",
"Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)",
"Mozilla/5.0 (compatible; URLAppendBot/1.0; +http://www.profound.net/urlappendbot.html)",
"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)",
"Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; YoudaoBot/1.0; http://www.youdao.com/help/webmaster/spider/; )",
"Mozilla/5.0 (compatible; aiHitBot/2.8; +http://endb-consolidated.aihit.com/)",
"Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)",
"Mozilla/5.0 (compatible; linkCheck)",
"Mozilla/5.0 (compatible; linkdexbot/2.0; +http://www.linkdex.com/about/bots/)",
"Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)",
"Mozilla/5.0 (compatible; special_archiver/3.1.1 +http://www.archive.org/details/archive.org_bot)",
"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7/Nutch-1.0",
"Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)",
"Mozilla/5.0+(compatible;+PiplBot;++http://www.pipl.com/bot/)",
"Murzillo compatible",
"NIS Nutch Spider/Nutch-1.7 Spider/Nutch-1.7",
"Nutch Experimental Crawler/Nutch-1.4 Experimental",
"Nutch12/Nutch-1.2 Nutch-1.2",
"NutchCVS",
"Nutscrape/",
"OmniExplorer",
"POE-Component-Client",
"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)",
"PussyCat",
"PycURL",
"QuerySeekerSpider ( http://queryseeker.com/bot.html )",
"SMNutchSpider/Nutch-1.7",
"SapphireWebCrawler/Nutch-1.0-dev (Sapphire Web Crawler using Nutch; http://boston.lti.cs.cmu.edu/crawler/; mhoy@cs.cmu.edu) http://boston.lti.cs.cmu.edu/crawler/",
"Shockwave Flash",
"ShowyouBot (http://showyou.com/crawler)",
"Slurp/Nutch-1.0-dev (Slurp Search Engineer; http://www.google.com/bot.html; nutch-agent@lucene.apache.org)",
"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
"Super Happy Fun",
"Test crawler Nutch/Nutch-1.0-dev (Nutch Test Project; changkuk@cmu.edu) Nutch-1.0-dev",
"TrackBack/",
"Turing Machine",
"Twitterbot/1.0",
"User Agent:",
"User-Agent: Some-Agent/1.0",
"User-agent:",
"WIRE/0.22 (Linux; x86_64; Bot,Robot,Spider,Crawler)",
"WISEbot",
"WISEnutbot",
"WeSEE:Search/0.1 (Alpha, http://www.wesee.com/bot/)",
"WeSEE:Search/0.1 (Alpha, http://www.wesee.com/en/support/bot/)",
"WebSite-X Suite",
"WebaltBot",
"Windows NT 4.0;)",
"Windows NT 5.0;)",
"Windows NT 5.1;)",
"Windows XP 5",
"Winnie Poh",
"WordPress/4.0.1;",
"WordPress/4.01",
"Wordpress",
"Yahoo:LinkExpander:Slingstone",
"Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)",
"Zscho.de Crawler/Nutch-1.0-Zscho.de-semantic_patch (Zscho.de Crawler, collecting for machine learning; http://zscho.de/ )",
"a href=",
"adidxbot/2.0 (+http://search.msn.com/msnbot.htm)",
"adwords",
"autoemailspider",
"bitlybot",
"blogsearchbot-martin",
"compatible ; MSIE",
"compatible-",
"core-project/",
"ecollector",
"grub crawler",
"grub-client",
"hanzoweb",
"larbin@unspecified",
"libwww-perl",
"libwww-perl/5.805",
"msnbot-UDiscovery/2.0b (+http://search.msn.com/msnbot.htm)",
"msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)",
"msnbot/2.0b (+http://search.msn.com/msnbot.htm)",
"nutch-1.3/Nutch-1.3 Nutch-1.3",
"nutch-1.4/Nutch-1.4 Nutch-1.4",
"psbot-image (+http://www.picsearch.com/bot.html)",
"psbot/0.1 (+http://www.picsearch.com/bot.html)",
"psycheclone",
"research-scan-bot/Nutch-1.0",
"rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+shiny@moz.com)",
"spider",
"user",
"www.integromedb.org/Crawler",
""
]
foreach ($UserAgent in $BadUserAgents)
if (string($ua, $UserAgent)) return 1;
return 0;
}
sub isUsefulUserAgent($ua) {
$UsefulUserAgents = [
"AdsBot-Google (+http://www.google.com/adsbot.html)",
"AdsBot-Google-Mobile (+http://www.google.com/mobile/adsbot.html) Mozilla (iPhone; U; CPU iPhone OS 3 0 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile Safari",
"DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
"Feedfetcher-Google;+(+http://www.google.com/feedfetcher.html;",
"GoogleProducer;+(+http://goo.gl/7y4SX)",
"Googlebot/2.1 (+http://www.googlebot.com/bot.html)",
"Mobile for smartphones user-agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
"Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111107 Ubuntu/10.04 (lucid) Firefox/3.6.24 Mozilla/3.5 (Google-HotelAdsVerifier)",
"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)",
"Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)",
"Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
"SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
"facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
"ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)",
]
foreach ($UserAgent in $UsefulUserAgents)
if (string($ua, $UserAgent)) return 1;
return 0;
}