Skip to content

Instantly share code, notes, and snippets.

@drsnyder
Created July 27, 2012 17:47
Show Gist options
  • Save drsnyder/3189336 to your computer and use it in GitHub Desktop.
Save drsnyder/3189336 to your computer and use it in GitHub Desktop.
Gathering hit rate statistics for bots and non
#!/usr/bin/perl
# vim: set ts=4
use strict;
my $hitsize = 0;
my $misssize = 0;
my $hitcount = 0;
my $misscount = 0;
my $botcount = 0;
my $bothitcount = 0;
my $bothitsize = 0;
my $botmisssize = 0;
my $botmisscount = 0;
my $botcacheablecount = 0;
my $botnoncacheablecount = 0;
my $othercount = 0;
my $cacheablecount = 0;
while (my $line = <>) {
if ($line =~ /(msn|google|bing|yandex|youdao|exa|mj12|omgili|flr-|ahrefs|blekko)bot/i ||
$line =~ /(magpie|mediapartners|sogou|baiduspider|nutch|yahoo.*slurp|genieo)/i) {
$botcount++;
if ($line =~ /.*?"GET (.*) HTTP\/\d\.\d" \d+ (\d+).*/) {
$botcacheablecount++;
my $url = $1;
my $size = $2;
if ($line =~ /HIT/) {
$bothitsize += $size;
$bothitcount++;
} else {
$botmisssize += $size;
$botmisscount++;
}
} else {
$botnoncacheablecount++;
}
next;
}
if ($line =~ /.*?"GET (.*) HTTP\/\d\.\d" \d+ (\d+).*/) {
$cacheablecount++;
my $url = $1;
my $size = $2;
if ($line =~ /HIT/) {
$hitsize += $size;
$hitcount++;
} else {
$misssize += $size;
$misscount++;
}
} else {
$othercount++;
}
}
printf("The total size of non-bot object requested: %2.2f MB\n", ( $hitsize + $misssize ) / 1048576);
printf("Cacheable miss size: %2.2f MB\n", $misssize / 1048576);
printf("Cacheable hit size: %2.2f MB\n", $hitsize / 1048576);
printf("The non-bot hitrate: %2.2f%%\n", ($hitcount / $cacheablecount) * 100);
printf("The total number of non-bot objects: %d\n", $cacheablecount + $othercount);
printf("Not-cacheable and non-bot count: %d (%2.2f%% of total cacheable)\n", $othercount, ($othercount / ( $othercount + $cacheablecount )) * 100);
printf("\n");
printf("The total size of bot object requested: %2.2f MB\n", ( $bothitsize + $botmisssize ) / 1048576);
printf("Bot miss size: %2.2f MB\n", $botmisssize / 1048576);
printf("Bot hit size: %2.2f MB\n", $bothitsize / 1048576);
printf("The bot hitrate: %2.2f%%\n", ($bothitcount / $botcount) * 100);
printf("Bot not-cacheable count: %d (%2.2f%% of total bot)\n", $botnoncacheablecount, ($botnoncacheablecount / ( $botnoncacheablecount + $botcount )) * 100);
printf("\n");
printf("Total object size requested (bot + cacheable): %2.2f MB\n", ($misssize + $hitsize + $botmisssize + $bothitsize) / 1048576);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment