Skip to content

Instantly share code, notes, and snippets.

View randyzwitch's full-sized avatar

Randy Zwitch randyzwitch

View GitHub Profile
@randyzwitch
randyzwitch / hive.sql
Created December 16, 2013 20:21
Example hive query
-- This will give me back a text file for every reducer, then I need to cat * > outfile to get a single text file
-- Feels like there should be a simple setting to tell Hive that I want a single text file back
--query_history
insert overwrite local directory '/tmp/hive/old_backup/query_history'
select
to_date(from_unixtime(CAST(created as int))) as query_date,
account_code,
audit_key,
entity_code,
@randyzwitch
randyzwitch / dummy.R
Last active January 2, 2016 00:29
Create dummy variables to bypass 32-level limit using RandomForests
#Generate example dataframe with character column
example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
names(example) <- "strcol"
#For every unique value in the string column, create a new 1/0 column
#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
for(level in unique(example$strcol)){
example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)
}
@randyzwitch
randyzwitch / airline.pig
Created January 13, 2014 01:25
Pig script to calculate average distance by airline route
--Load data from view to use
air = LOAD 'default.vw_airline' USING org.apache.hcatalog.pig.HCatLoader();
--Use FOREACH to limit data to origin, dest, distance
--Concatentate origin and destination together, separated by a pipe
--CONCAT appears to only allow two arguments, which is why the function is called twice (to allow 3 arguments)
origindest = FOREACH air generate CONCAT(origin, CONCAT('|' , dest)) as route, distance;
--Group origindest dataset by route
groupedroutes = GROUP origindest BY (route);
@randyzwitch
randyzwitch / UAParser-documentation.ipynb
Created January 21, 2014 20:47
Documentation and examples for UAParser Julia package
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@randyzwitch
randyzwitch / uaparser-examples.jl
Created January 21, 2014 20:55
Minimal examples of UAParser.jl
Pkg.add("UAParser")
using UAParser
#Example user-agent string
user_agent_string = "Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B179 Safari/7534.48.3"
#Get device from user-agent string
parsedevice(user_agent_string) #=> DeviceResult("iPhone")
#Get browser information from user-agent string
@randyzwitch
randyzwitch / rsitecatalyst-search.R
Created February 4, 2014 14:03
Examples of Search functionality for RSiteCatalyst v1.3
#Top 100 Pages where the pagename starts with "Categories"
#Uses searchKW argument
queue_ranked_pages_search <- QueueRanked("production",
"2013-01-01",
"2014-01-28",
c("pageviews", "visits"),
"page",
top = "100",
searchKW = "^Categories"
)
@randyzwitch
randyzwitch / rsitecatalyst-variable-timing.R
Created February 4, 2014 14:23
Example of variable timing on a request call
#Change timing of function call
#Wait 30 seconds between attempts to retrieve the report, try 5 times
queue_overtime_visits_pv_day_social_anomaly2 <- QueueOvertime("production",
"2013-01-01",
"2014-01-28",
c("visits", "pageviews"),
"day",
"Visit_Social",
anomalyDetection = "1",
currentData = "1",
@randyzwitch
randyzwitch / array_custom_types
Last active August 29, 2015 13:57
Example from JMW
#Twitter response comes back as string
#Use JSON.jl to make into a Dict
#Create a custom type from Dict, in this case TWEETS
#Want to define a DataFrame method DataFrame(response::Array{TWEETS,1})
20-element Array{TWEETS,1}:
TWEETS(nothing,nothing,"Thu Mar 06 17:19:12 +0000 2014",nothing,["symbols"=>{},"user_mentions"=>{["screen_name"=>"randyzwitch","id_str"=>"98689850","id"=>98689850,"name"=>"Randy Zwitch","indices"=>{0,12}]},"hashtags"=>{},"urls"=>{}],0,false,nothing,441624046657355777,"441624046657355777","randyzwitch",441623445286436864,"441623445286436864",98689850,"98689850","en",nothing,nothing,nothing,1,true,nothing,"<a href=\"http://janetter.net/\" rel=\"nofollow\">Janetter</a>","@randyzwitch the R gods demand sacrifice!!!!!!!!!!!!!!!",false,["screen_name"=>"Randy_Au","profile_use_background_image"=>true,"id_str"=>"148398537","utc_offset"=>nothing,"listed_count"=>7,"profile_sidebar_border_color"=>"C0DEED","profile_image_url"=>"http://pbs.twimg.com/profile_images/2751420418/7f1ff3346d047d82ca93b9413
@randyzwitch
randyzwitch / composite-df.jl
Created March 7, 2014 15:28
DataFrame method on composite type
function DataFrame(array::Array{TWEETS, 1})
#Empty df as container for results
resultdf = DataFrame()
#Get array of field names as symbols from composite type
cols = names(TWEETS)
#For each field in composite type...
for column in cols
@randyzwitch
randyzwitch / getrealtime.R
Created March 10, 2014 14:38
RSiteCatalyst Realtime reports
#Get Real-Time reports that already set up
realtime_reports <- GetRealTimeConfiguration("<reportsuite>")