Created
September 24, 2012 14:46
-
-
Save atopal/3776311 to your computer and use it in GitHub Desktop.
Number of clickthroughs en-US
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select ds, count(*) AS clickhtroughs | |
FROM research_logs | |
WHERE | |
ds = '${hiveconf:check_date}' | |
AND `domain`='support.mozilla.com' | |
AND ip_address != 'NULL' AND http_version = 200 AND request_type = 'GET' | |
AND parse_url(empty_string_1,'PATH') RLIKE '\\/en-US\/search\$' | |
AND parse_url(empty_string_1,'HOST') = 'support.mozilla.org' | |
AND | |
( parse_url(concat('http://a.com',request_url),'PATH') RLIKE '\\/kb\\/' | |
OR parse_url(concat('http://a.com',request_url),'PATH') RLIKE '\\/questions\\/\\d+\$' | |
) | |
AND parse_url(concat('http://a.com',request_url),'QUERY','as') = 's' | |
AND user_agent NOT LIKE '%bot%' | |
AND user_agent NOT LIKE '%Netsparker%' | |
AND ip_address NOT LIKE '89.123.67.154' | |
group by ds order by ds; |
Netsparker is a known security scanner skewing our results
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The d+ in questions makes sure we only get forum threads and nothing else
as=s makes sure people come from the internal search and not from in product or any other source
removal of %bot% removes all occurrences of Google Bot and nothing else.