This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- This will give me back a text file for every reducer, then I need to cat * > outfile to get a single text file | |
-- Feels like there should be a simple setting to tell Hive that I want a single text file back | |
--query_history | |
insert overwrite local directory '/tmp/hive/old_backup/query_history' | |
select | |
to_date(from_unixtime(CAST(created as int))) as query_date, | |
account_code, | |
audit_key, | |
entity_code, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Generate example dataframe with character column | |
example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F")) | |
names(example) <- "strcol" | |
#For every unique value in the string column, create a new 1/0 column | |
#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data | |
for(level in unique(example$strcol)){ | |
example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0) | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--Load data from view to use | |
air = LOAD 'default.vw_airline' USING org.apache.hcatalog.pig.HCatLoader(); | |
--Use FOREACH to limit data to origin, dest, distance | |
--Concatentate origin and destination together, separated by a pipe | |
--CONCAT appears to only allow two arguments, which is why the function is called twice (to allow 3 arguments) | |
origindest = FOREACH air generate CONCAT(origin, CONCAT('|' , dest)) as route, distance; | |
--Group origindest dataset by route | |
groupedroutes = GROUP origindest BY (route); |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pkg.add("UAParser") | |
using UAParser | |
#Example user-agent string | |
user_agent_string = "Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B179 Safari/7534.48.3" | |
#Get device from user-agent string | |
parsedevice(user_agent_string) #=> DeviceResult("iPhone") | |
#Get browser information from user-agent string |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Top 100 Pages where the pagename starts with "Categories" | |
#Uses searchKW argument | |
queue_ranked_pages_search <- QueueRanked("production", | |
"2013-01-01", | |
"2014-01-28", | |
c("pageviews", "visits"), | |
"page", | |
top = "100", | |
searchKW = "^Categories" | |
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Change timing of function call | |
#Wait 30 seconds between attempts to retrieve the report, try 5 times | |
queue_overtime_visits_pv_day_social_anomaly2 <- QueueOvertime("production", | |
"2013-01-01", | |
"2014-01-28", | |
c("visits", "pageviews"), | |
"day", | |
"Visit_Social", | |
anomalyDetection = "1", | |
currentData = "1", |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Twitter response comes back as string | |
#Use JSON.jl to make into a Dict | |
#Create a custom type from Dict, in this case TWEETS | |
#Want to define a DataFrame method DataFrame(response::Array{TWEETS,1}) | |
20-element Array{TWEETS,1}: | |
TWEETS(nothing,nothing,"Thu Mar 06 17:19:12 +0000 2014",nothing,["symbols"=>{},"user_mentions"=>{["screen_name"=>"randyzwitch","id_str"=>"98689850","id"=>98689850,"name"=>"Randy Zwitch","indices"=>{0,12}]},"hashtags"=>{},"urls"=>{}],0,false,nothing,441624046657355777,"441624046657355777","randyzwitch",441623445286436864,"441623445286436864",98689850,"98689850","en",nothing,nothing,nothing,1,true,nothing,"<a href=\"http://janetter.net/\" rel=\"nofollow\">Janetter</a>","@randyzwitch the R gods demand sacrifice!!!!!!!!!!!!!!!",false,["screen_name"=>"Randy_Au","profile_use_background_image"=>true,"id_str"=>"148398537","utc_offset"=>nothing,"listed_count"=>7,"profile_sidebar_border_color"=>"C0DEED","profile_image_url"=>"http://pbs.twimg.com/profile_images/2751420418/7f1ff3346d047d82ca93b9413 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function DataFrame(array::Array{TWEETS, 1}) | |
#Empty df as container for results | |
resultdf = DataFrame() | |
#Get array of field names as symbols from composite type | |
cols = names(TWEETS) | |
#For each field in composite type... | |
for column in cols |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Get Real-Time reports that already set up | |
realtime_reports <- GetRealTimeConfiguration("<reportsuite>") |