Skip to content

Instantly share code, notes, and snippets.

View randyzwitch's full-sized avatar

Randy Zwitch randyzwitch

View GitHub Profile
@randyzwitch
randyzwitch / rugplot.json
Created October 6, 2015 17:53
Vega rug plot
{
"name": "Vega Visualization",
"height": 20,
"padding": "auto",
"marks": [
{
"name": "cell",
"marks": [
{
"properties": {
@randyzwitch
randyzwitch / waterfall.json
Last active October 6, 2015 17:42
Vega waterfall plot
{
"name": "Vega Visualization",
"height": 450,
"padding": "auto",
"marks": [
{
"properties": {
"enter": {
"x": {
"offset": 31,
@randyzwitch
randyzwitch / gist:7ced012dde274a8c80ed
Last active October 7, 2015 12:50
Vega boxplot using mtcars
{
"name": "boxplot",
"height": 200,
"padding": "auto",
"width": 600,
"signals": [
{
"name": "boxSize",
"init": 40
}
@randyzwitch
randyzwitch / dplyr-sql-generated.R
Created January 13, 2015 21:02
SQL generated by dplyr
SELECT
"uid",
"event_timestamp",
"minutes_since_last_event",
"event_boundary",
"session_id"
FROM (
SELECT
"uid",
"event_timestamp",
@randyzwitch
randyzwitch / dplyr-postgres-sessionizing.R
Created January 13, 2015 20:53
Sessionizing Log File Data Using dplyr
###Sessionization using dplyr
library(dplyr)
#Open a localhost connection to Postgres
#Use table 'single_col_timestamp'
#group by uid and sort by timestamp for window function
#Do minutes calculation, working around missing support for extract(epoch from timestamp)
#Calculate event boundary and unique id via cumulative sum window function
sessions <-
@randyzwitch
randyzwitch / final-sessionization-query.sql
Created January 8, 2015 16:35
Sessionizing log file data using SQL
--Query 3: Outer query uses window function with sum to do cumulative sum as the id, concatentate to uid
select
uid,
uid || '-' || cast(sum(new_event_boundary) OVER (PARTITION BY uid ORDER BY event_timestamp) as varchar) as session_id,
event_timestamp
from
--Query 1: Define boundary events
(select
uid,
event_timestamp,
@randyzwitch
randyzwitch / sessionid.sql
Created January 8, 2015 16:12
Postgres Sessionize Logfile data
select
uid,
sum(new_event_boundary) OVER (PARTITION BY uid ORDER BY event_timestamp) as session_id,
event_timestamp,
minutes_since_last_interval,
new_event_boundary
from
--Query 1: Define boundary events
(select
uid,
@randyzwitch
randyzwitch / boundary.sql
Created January 8, 2015 15:48
Postgres Lag Window Function
--Create boundaries at 30 minute timeout
select
uid,
event_timestamp,
(extract(epoch from event_timestamp) - lag(extract(epoch from event_timestamp)) OVER (PARTITION BY uid ORDER BY event_timestamp))/60 as minutes_since_last_interval,
case when extract(epoch from event_timestamp) - lag(extract(epoch from event_timestamp)) OVER (PARTITION BY uid ORDER BY event_timestamp) > 30 * 60 then 1 ELSE 0 END as new_event_boundary
from single_col_timestamp;
@randyzwitch
randyzwitch / tweets-to-dataframes.jl
Created December 8, 2014 21:56
Calling DataFrame on Twitter TWEETS type
DataFrame(julia_tweets_100)
@randyzwitch
randyzwitch / search-tweets-100.jl
Created December 8, 2014 01:46
100 #Julialang Tweets
julia_tweets_100 = get_search_tweets("#julialang"; options = {"count" => "100"})