-
-
Save geerlingguy/a438b41a9a8f988ee106 to your computer and use it in GitHub Desktop.
| <?php | |
| /** | |
| * Check if the given user agent string is one of a crawler, spider, or bot. | |
| * | |
| * @param string $user_agent | |
| * A user agent string (e.g. Googlebot/2.1 (+http://www.google.com/bot.html)) | |
| * | |
| * @return bool | |
| * TRUE if the user agent is a bot, FALSE if not. | |
| */ | |
| function smart_ip_detect_crawler($user_agent) { | |
| // User lowercase string for comparison. | |
| $user_agent = strtolower($_SERVER['HTTP_USER_AGENT']); | |
| // A list of some common words used only for bots and crawlers. | |
| $bot_identifiers = array( | |
| 'bot', | |
| 'slurp', | |
| 'crawler', | |
| 'spider', | |
| 'curl', | |
| 'facebook', | |
| 'fetch', | |
| ); | |
| // See if one of the identifiers is in the UA string. | |
| foreach ($bot_identifiers as $identifier) { | |
| if (strpos($user_agent, $identifier) !== FALSE) { | |
| return TRUE; | |
| } | |
| } | |
| return FALSE; | |
| } |
Also why name the function smart_ip_detect_crawler(). It does not have anything to do with IP. Maybe you could name it just smart_detect_crawler().
This script is not meant to be the be-all-and-end-all of bot detection ;)
Hopefully it's helpful if you're working on your own system, and as @FinlayDaG33k mentioned... it's trivial to bypass this. It was originally used just as a metric for how much traffic was coming to a certain page was bots like GoogleBot and Bing's bot.
whats the point of passing
$user_agentas an argument and after that overriding it with$user_agent = strtolower($_SERVER['HTTP_USER_AGENT']);I guess you dont need that argument.
I think he just forgot to clean that up.
Checking for google bot by reverse dns lookup: https://github.com/kalmargabor/crawler-check/blob/master/src/CrawlerCheck/CrawlerCheck.php
whats the point of passing
$user_agentas an argument and after that overriding it with$user_agent = strtolower($_SERVER['HTTP_USER_AGENT']);I guess you dont need that argument.