Skip to content

Instantly share code, notes, and snippets.

@randyzwitch
Created January 13, 2014 01:25
Show Gist options
  • Save randyzwitch/8393213 to your computer and use it in GitHub Desktop.
Save randyzwitch/8393213 to your computer and use it in GitHub Desktop.
Pig script to calculate average distance by airline route
--Load data from view to use
air = LOAD 'default.vw_airline' USING org.apache.hcatalog.pig.HCatLoader();
--Use FOREACH to limit data to origin, dest, distance
--Concatentate origin and destination together, separated by a pipe
--CONCAT appears to only allow two arguments, which is why the function is called twice (to allow 3 arguments)
origindest = FOREACH air generate CONCAT(origin, CONCAT('|' , dest)) as route, distance;
--Group origindest dataset by route
groupedroutes = GROUP origindest BY (route);
--Calculate average distance by route
avg_distance = FOREACH groupedroutes GENERATE group, AVG(origindest.distance);
--Show results in Pig shell
dump avg_distance;
--Write out results to text file, separated by tab (default)
store avg_distance into '/user/hue/avg_distance';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment