Skip to content

Instantly share code, notes, and snippets.

@atopal
Created September 24, 2012 14:46
Show Gist options
  • Save atopal/3776311 to your computer and use it in GitHub Desktop.
Save atopal/3776311 to your computer and use it in GitHub Desktop.
Number of clickthroughs en-US
select ds, count(*) AS clickhtroughs
FROM research_logs
WHERE
ds = '${hiveconf:check_date}'
AND `domain`='support.mozilla.com'
AND ip_address != 'NULL' AND http_version = 200 AND request_type = 'GET'
AND parse_url(empty_string_1,'PATH') RLIKE '\\/en-US\/search\$'
AND parse_url(empty_string_1,'HOST') = 'support.mozilla.org'
AND
( parse_url(concat('http://a.com',request_url),'PATH') RLIKE '\\/kb\\/'
OR parse_url(concat('http://a.com',request_url),'PATH') RLIKE '\\/questions\\/\\d+\$'
)
AND parse_url(concat('http://a.com',request_url),'QUERY','as') = 's'
AND user_agent NOT LIKE '%bot%'
AND user_agent NOT LIKE '%Netsparker%'
AND ip_address NOT LIKE '89.123.67.154'
group by ds order by ds;
@atopal
Copy link
Author

atopal commented Oct 5, 2012

The d+ in questions makes sure we only get forum threads and nothing else
as=s makes sure people come from the internal search and not from in product or any other source
removal of %bot% removes all occurrences of Google Bot and nothing else.

@atopal
Copy link
Author

atopal commented Oct 19, 2012

Netsparker is a known security scanner skewing our results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment