Skip to content

Instantly share code, notes, and snippets.

Original Message
Message ID <[email protected]>
Created at: Mon, Feb 6, 2017 at 5:48 PM (Delivered after 7454 seconds)
From: LastPass <[email protected]>Using LastPass.com (www.lastpass.com)
To: "[email protected]" <[email protected]>
Subject: LastPass Verification Email
SPF: PASS with IP 74.84.128.88 Learn more
DKIM: PASS with domain lastpass.com Learn more
DMARC: PASS Learn more

PageRank Your Data With Mortar And Pig

One of the most exciting things about Apache Pig is the level of control it gives you over data pipelines. In a pigscript, you can split or recombine parts of your data at any point, allowing you to avoid redundant computation or store intermediate results for your job. However, there is a limit to what you can do with a pigscript alone: Pig by itself does not have any ways to specify control flow, such as loops or conditional operations.

To use Pig as part of a control flow, you can use a Jython "control script" (Jython is simply a Java-based implementation of the Python programming language) which calls and configures pigscripts using an API called "Embedded Pig". The control script can dynamically pass parameters to pigscripts based on algorithm or business logic, and can repeatedly call a pigscript in a loop, using the output of the previous iteration as the input to the next. This makes it possible to implement a wide range of iterative algorithms which would no