Skip to content

Instantly share code, notes, and snippets.

Forked from airawat/00-CustomPigEvalUDF-NVL2
Last active August 29, 2015 14:09
Show Gist options
  • Save risarora/a60d1356a5ca9ea52cf5 to your computer and use it in GitHub Desktop.
Save risarora/a60d1356a5ca9ea52cf5 to your computer and use it in GitHub Desktop.
This gist covers a simple Pig eval UDF in Java, that mimics NVL2 functionality in Oracle.
1. Input data
2. UDF code in java
3. Pig script to demo the UDF
4. Expected result
5. Command to execute script
6. Output
package khanolkar.pigUDFs;
import org.apache.pig.EvalFunc;
// Custom UDF
// Name: NVL2
// Parameters: Tuple with three Strings
// Purpose: Facilitates handling nulls + replacing non-null values
// If the first parameter is null, returns the third parameter,
// otherwise returns the second parameter
// E.g. NVL2(null,"Busy bee","Sloth") = "Sloth"
// E.g. NVL2("Anagha","Busy bee","Sloth") = "Busy bee"
// Returns: Null if tuple is empty
// Null if the three input parameters are not in the tuple
// Otherwise, Result of applying NVL2 logic
public class NVL2 extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
if (input.size() == 3) {
String expr1 = (String) input.get(0);
String expr2 = (String) input.get(1);
String expr3 = (String) input.get(2);
return (expr1 != null ? expr2 : expr3);
} else {
return null;
} catch (Exception e) {
// Cause task failure
throw new IOException("Error with UDF, NVL2!", e);
# Pig Script
# NVL2UDFDemo.pig
register NVL2.jar;
define NVL2 khanolkar.pigUDFs.NVL2;
rawDS = load 'departments' using PigStorage() as (deptNo:chararray, deptName:chararray);
transformedDS = foreach rawDS generate $0, NVL2($1,$1,'Procrastination');
dump transformedDS;
# Input data
d001 Marketing
d002 Finance
d003 Human Resources
d004 Production
d005 Development
d006 Quality Management
d007 Sales
d009 Customer Service
# Directory structure
# Load script and data to HDFS
$ hadoop fs -mkdir pigProject
$ hadoop fs -mkdir pigProject/evalFunc
$ hadoop fs -put pigProject/evalFunc/* pigProject/evalFunc
# Command to test
On the cluster
$ pig pigProject/evalFunc/NVL2/NVL2UDFDemo.pig
$ pig -x local pigProject/evalFunc/NVL2/NVL2UDFDemo.pig
# Output data
(d003,Human Resources)
(d006,Quality Management)
(d009,Customer Service)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment