Created
September 24, 2012 14:46
-
-
Save atopal/3776307 to your computer and use it in GitHub Desktop.
Number of Searches en-US
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select count(*) as denominator | |
FROM research_logs | |
WHERE | |
ds = '${hiveconf:check_date}' | |
AND `domain`='support.mozilla.com' | |
AND ip_address != 'NULL' AND http_version = 200 AND request_type = 'GET' | |
AND parse_url(concat('http://a.com',request_url),'PATH') RLIKE '\\/en-US\/search\$' | |
AND parse_url(concat('http://a.com',request_url),'QUERY','a') IS NULL | |
AND parse_url(concat('http://a.com',request_url),'QUERY','page') IS NULL | |
AND parse_url(concat('http://a.com',request_url),'QUERY','format') IS NULL | |
AND user_agent NOT LIKE '%bot%' | |
AND user_agent NOT LIKE '%Netsparker%' | |
group by ds; |
Netsparker is a known security tester, skewing our results
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
NULL for page is to make sure we don't count people who page the search results
NULL for format is to discount uses of the search in our canned responses
NULL for a is to discount uses of advanced search (which is heavily exploited)
removal of %bot% removes all occurrences of Google Bot and nothing else.