Skip to content

Instantly share code, notes, and snippets.

View bcambel's full-sized avatar
🌴
On vacation

Bahadir Cambel bcambel

🌴
On vacation
View GitHub Profile
explain analyse
select sum(pv), country FROM (
select sum(pv) as pv, country
from events_2015_09_01
group by country
UNION ALL
select sum(pv) as pv, country
from events_2015_09_02
group by country
UNION ALL
INSERT INTO events_2015_09_17
SELECT count(id) as pv, evt_name, hour, device_type, browser, os, domain, path, utm_source,
utm_medium, utm_campaign, utm_content, utm_term, country
FROM events where day='2015-09-17' and evt_name='pv'
group by 2,3,4,5,6,7,8,9,10,11,12,13,14
order by count(id) desc;
CREATE FUNCTION day_view_generator (day character varying) RETURNS void
VOLATILE
AS $$
BEGIN
EXECUTE format('CREATE TABLE IF NOT EXISTS events_' || day || ' AS (SELECT 1 as pv, evt_name, hour, device_type, browser, os, domain, path, utm_source, utm_medium, utm_campaign, utm_content, utm_term, country FROM events limit 1)');
END;
$$ LANGUAGE plpgsql
explain analyse
select count(id), country
FROM events where (ts between '2015-09-01' and '2015-10-01') and evt_name='pv'
GROUP BY country
ORDER BY count(id) desc;
Sort (cost=2667410.55..2667410.69 rows=55 width=66) (actual time=148570.492..148570.504 rows=216 loops=1)
Sort Key: (count(id)) DESC
Sort Method: quicksort Memory: 38kB
-> HashAggregate (cost=2667408.41..2667408.96 rows=55 width=66) (actual time=148570.353..148570.416 rows=216 loops=1)
@bcambel
bcambel / events.sql
Last active December 11, 2015 08:45
CREATE TABLE
events
(
account_id CHARACTER VARYING(50),
id CHARACTER VARYING(56) NOT NULL,
evt_id TEXT,
evt_name CHARACTER VARYING(100),
evt_type CHARACTER VARYING(50),
ref_id TEXT,
ts TIMESTAMP(6) WITHOUT TIME ZONE,
@bcambel
bcambel / query.sql
Last active November 22, 2015 18:37
-- Single index on ts field
explain analyse
SELECT count(id), domain, hour
FROM events
WHERE ts between '2015-09-01' and '2015-09-04' and domain in ('hackersome.com')
GROUP BY domain,hour
--HashAggregate (cost=1977121.02..1977126.98 rows=596 width=75) (actual time=6861.694..6861.722 rows=72 loops=1)
-- Group Key: domain, hour
@bcambel
bcambel / article.md
Created October 26, 2015 18:41
Ad hoc data analysis/discovery of event data using Clojure Flambo spark

Recently an opportunity at work arose to run some ad hoc data analysis/discovery on our event data. The required analysis was a bit more complicated than what was provided by our current reporting capabilities. This complication was primarily due to both the quality and the layout/structure of the existing data. We wanted to understand how much of the revenue was coming from an internal (machine learning supported) recommendation service, for given products over various time boundaries. From these sales, we wanted to further understand which recommendation algorithm was used to drive the purchase. Measuring the performance of these algorithms would allow us to fine tune them in the future and have the ability to ascertain whether or not those changes made a positive or negative impact on the overall revenue.

It was determined early on that trying to do this in plain SQL wasn't a great fit, and having this data stored in Amazon Redshift constrained the approach to just SQL. Luckily, I had been pulling the

def get_val(inst, fields):
if isinstance(fields, basestring):
return getattr(inst,fields, "")
next_field = fields.pop(0)
next_value = getattr(inst, next_field, "")
if len(fields)>0:
return get_val(next_value, fields)
@bcambel
bcambel / zookeeper.conf
Created May 7, 2015 06:15
Zookeeper Supervisord
[program:zookeeper]
killgroup=true
directory=/opt/zookeeper
command=java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper/bin/../lib/jline-0.9.94.jar:/opt/zookeeper/bin/../zookeeper-3.4.6.jar:/opt/zookeeper/bin/../src/java/lib/*.jar:/opt/zookeeper/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/zookeeper/bin/../conf/zoo.cfg
autostart=true
autorestart=true
redirect_stderr=true
stopsignal=INT
stdout_logfile=/var/log/syslog
@bcambel
bcambel / elb_instances.py
Last active August 29, 2015 14:17
Fetches all the instance information of a given AWS Elastic Load Balancer
import boto
from boto import regioninfo
from boto import ec2
ACCESS_KEY_ID = ''
SECRET_ACCESS_KEY = ''
elb_region = boto.regioninfo.RegionInfo(
name='eu-west-1',
endpoint='elasticloadbalancing.eu-west-1.amazonaws.com')