Created
January 13, 2014 01:25
-
-
Save randyzwitch/8393213 to your computer and use it in GitHub Desktop.
Pig script to calculate average distance by airline route
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--Load data from view to use | |
air = LOAD 'default.vw_airline' USING org.apache.hcatalog.pig.HCatLoader(); | |
--Use FOREACH to limit data to origin, dest, distance | |
--Concatentate origin and destination together, separated by a pipe | |
--CONCAT appears to only allow two arguments, which is why the function is called twice (to allow 3 arguments) | |
origindest = FOREACH air generate CONCAT(origin, CONCAT('|' , dest)) as route, distance; | |
--Group origindest dataset by route | |
groupedroutes = GROUP origindest BY (route); | |
--Calculate average distance by route | |
avg_distance = FOREACH groupedroutes GENERATE group, AVG(origindest.distance); | |
--Show results in Pig shell | |
dump avg_distance; | |
--Write out results to text file, separated by tab (default) | |
store avg_distance into '/user/hue/avg_distance'; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment