Skip to content

Instantly share code, notes, and snippets.

@atopal
Created September 24, 2012 14:46
Show Gist options
  • Save atopal/3776307 to your computer and use it in GitHub Desktop.
Save atopal/3776307 to your computer and use it in GitHub Desktop.
Number of Searches en-US
select count(*) as denominator
FROM research_logs
WHERE
ds = '${hiveconf:check_date}'
AND `domain`='support.mozilla.com'
AND ip_address != 'NULL' AND http_version = 200 AND request_type = 'GET'
AND parse_url(concat('http://a.com',request_url),'PATH') RLIKE '\\/en-US\/search\$'
AND parse_url(concat('http://a.com',request_url),'QUERY','a') IS NULL
AND parse_url(concat('http://a.com',request_url),'QUERY','page') IS NULL
AND parse_url(concat('http://a.com',request_url),'QUERY','format') IS NULL
AND user_agent NOT LIKE '%bot%'
AND user_agent NOT LIKE '%Netsparker%'
group by ds;
@atopal
Copy link
Author

atopal commented Oct 5, 2012

NULL for page is to make sure we don't count people who page the search results
NULL for format is to discount uses of the search in our canned responses
NULL for a is to discount uses of advanced search (which is heavily exploited)
removal of %bot% removes all occurrences of Google Bot and nothing else.

@atopal
Copy link
Author

atopal commented Oct 19, 2012

Netsparker is a known security tester, skewing our results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment