Created
November 2, 2011 03:57
-
-
Save brikis98/1332818 to your computer and use it in GitHub Desktop.
Apache Pig Script Example
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pv_by_industry = GROUP profile_view by viewee_industry_id | |
pv_avg_by_industry = FOREACH pv_by_industry | |
GENERATE group as viewee_industry_id, AVG(profie_view) AS average_pv; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pv_group_by_viewee = GROUP profile_view by viewee_member_id | |
pv_with_count = FOREACH pv_group_by_viewee { | |
GENERATE viewee_member_id, COUNT_STAR(profile_view) as pv_count; | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
connections_pv_source = JOIN pv_with_count BY viewee_member_id, member_connections BY source_member_id; | |
connections_pv_dest = JOIN pv_with_count BY viewee_member_id, | |
connections_pv_source BY dest_member_id; | |
connections_with_higher_vp = FILTER connections_pv_dest BY connections_pv_dest::pv_count > connections_pv_source::pv_count; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
urls = load 'dataset' AS (url, category, pagerank); | |
groups = GROUP urls by category; | |
bigGroup = FILTER groups by COUNT(groups) > 100000; | |
STORE result bigGroup INTO 'bigGroupOutput'; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
few_pv_email_data = JOIN connections_with_higher_pv by viewee_industry_id, pv_avg_by_industry by viewee_industry_id; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Member_Connections Table | |
+---------------------------------+------------------+ | |
| Field | Type | | |
+---------------------------------+------------------+ | |
| source_member_id | int | | |
| dest_member_id | int | | |
+---------------------------------+------------------+ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Profile_View Table | |
+---------------------------------+------------------+ | |
| Field | Type | | |
+---------------------------------+------------------+ | |
| viewee_member_id | int | | |
| viewer_member_id | int | | |
| viewee_industry_id | int | | |
| tracking_time | timestamp | | |
+---------------------------------+------------------+ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See "User Engagement Powered by Apache Pig and Hadoop"" for more information.