-
-
Save risarora/a60d1356a5ca9ea52cf5 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist covers a simple Pig eval UDF in Java, that mimics NVL2 functionality in Oracle. | |
Included: | |
1. Input data | |
2. UDF code in java | |
3. Pig script to demo the UDF | |
4. Expected result | |
5. Command to execute script | |
6. Output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package khanolkar.pigUDFs; | |
import java.io.IOException; | |
import org.apache.pig.EvalFunc; | |
import org.apache.pig.data.Tuple; | |
// Custom UDF | |
// Name: NVL2 | |
// Parameters: Tuple with three Strings | |
// Purpose: Facilitates handling nulls + replacing non-null values | |
// If the first parameter is null, returns the third parameter, | |
// otherwise returns the second parameter | |
// E.g. NVL2(null,"Busy bee","Sloth") = "Sloth" | |
// E.g. NVL2("Anagha","Busy bee","Sloth") = "Busy bee" | |
// Returns: Null if tuple is empty | |
// Null if the three input parameters are not in the tuple | |
// Otherwise, Result of applying NVL2 logic | |
public class NVL2 extends EvalFunc<String> { | |
public String exec(Tuple input) throws IOException { | |
if (input == null || input.size() == 0) | |
return null; | |
try { | |
if (input.size() == 3) { | |
String expr1 = (String) input.get(0); | |
String expr2 = (String) input.get(1); | |
String expr3 = (String) input.get(2); | |
return (expr1 != null ? expr2 : expr3); | |
} else { | |
return null; | |
} | |
} catch (Exception e) { | |
// Cause task failure | |
throw new IOException("Error with UDF, NVL2!", e); | |
} | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#-------------------------------------------------------------------------------------- | |
# Pig Script | |
# NVL2UDFDemo.pig | |
#-------------------------------------------------------------------------------------- | |
register NVL2.jar; | |
define NVL2 khanolkar.pigUDFs.NVL2; | |
rawDS = load 'departments' using PigStorage() as (deptNo:chararray, deptName:chararray); | |
transformedDS = foreach rawDS generate $0, NVL2($1,$1,'Procrastination'); | |
dump transformedDS; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#--------------------------- | |
# Input data | |
#--------------------------- | |
d001 Marketing | |
d002 Finance | |
d003 Human Resources | |
d004 Production | |
d005 Development | |
d006 Quality Management | |
d007 Sales | |
d008 | |
d009 Customer Service | |
................. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#--------------------------- | |
# Directory structure | |
#--------------------------- | |
pigProject | |
evalFunc | |
NVL2 | |
departments | |
NVL2.jar | |
NVL2UDFDemo.pig |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#---------------------------------------------------------- | |
# Load script and data to HDFS | |
#---------------------------------------------------------- | |
$ hadoop fs -mkdir pigProject | |
$ hadoop fs -mkdir pigProject/evalFunc | |
$ hadoop fs -put pigProject/evalFunc/* pigProject/evalFunc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#--------------------------- | |
# Command to test | |
#--------------------------- | |
On the cluster | |
$ pig pigProject/evalFunc/NVL2/NVL2UDFDemo.pig | |
Locally | |
$ pig -x local pigProject/evalFunc/NVL2/NVL2UDFDemo.pig |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#--------------------------- | |
# Output data | |
#--------------------------- | |
(d001,Marketing) | |
(d002,Finance) | |
(d003,Human Resources) | |
(d004,Production) | |
(d005,Development) | |
(d006,Quality Management) | |
(d007,Sales) | |
(d008,Procrastination) | |
(d009,Customer Service) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment