Last active
December 19, 2015 18:59
-
-
Save airawat/6003001 to your computer and use it in GitHub Desktop.
Oozie workflow application with a java main action
The java program parses log files and generates a report.
Sample data, code, workflow components, commands are provided.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie workflow - scripts/code, sample data | |
and commands; Oozie actions covered: java main action; Oozie controls | |
covered: start, kill, end; The java program uses regex to parse the logs, and | |
also extracts pat of the mapper input directory path and includes in the key | |
emitted. | |
Usecase | |
------- | |
Parse Syslog generated log files to generate reports; | |
Pictorial overview of job: | |
-------------------------- | |
http://hadooped.blogspot.com/2013/06/apache-oozie-part-6-oozie-workflow-with.html | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data and script download: 02-DataAndScriptDownload | |
Data load commands: 03-HdfsLoadCommands | |
Java MR - Mapper code: 04A-MapperJavaCode | |
Java MR - Reducer code: 04B-ReducerJavaCode | |
Java MR - Driver code: 04C-DriverJavaCode | |
Command to test Java MR program: 04D-CommandTestJavaMRProg | |
Oozie job properties file: 05-OozieJobProperties | |
Oozie workflow file: 06-OozieWorkflowXML | |
Oozie commands 07-OozieJobExecutionCommands | |
Output -Report1 08-OutputOfJavaProgram | |
Oozie web console - screenshots 09-OozieWebConsoleScreenshots |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
01a. Sample data | |
----------------- | |
May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal | |
May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1 | |
May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray | |
May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr | |
May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max) | |
May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns | |
May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org | |
01b. Structure | |
--------------- | |
Month = May | |
Day = 3 | |
Time = 11:52:54 | |
Node = cdh-dn03 | |
Process = init: | |
Log msg = tty (/dev/tty6) main process (1208) killed by TERM signal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
02a. Data download | |
------------------- | |
gitHub: | |
https://github.com/airawat/OozieSamples | |
Email me at [email protected] if you encounter any issues. | |
Directory structure applicable for this blog/gist: | |
-------------------------------------------------- | |
oozieProject | |
data | |
airawat-syslog | |
<<Node-Name>> | |
<<Year>> | |
<<Month>> | |
messages | |
workflowJavaMainAction | |
workflow.xml | |
job.properties | |
lib | |
LogEventCount.jar |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
03-Hdfs load commands | |
---------------------- | |
$ hadoop fs -mkdir oozieProject | |
$ hadoop fs -put oozieProject/* oozieProject/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Source code for Mapper | |
//----------------------------------------------------------- | |
// LogEventCountMapper.java | |
//----------------------------------------------------------- | |
// Java program that parses logs using regex | |
// The program counts the number of processes logged by year. | |
// E.g. Key=2013-ntpd; Value=1; | |
package Airawat.Oozie.Samples; | |
import java.io.IOException; | |
import java.util.regex.Matcher; | |
import java.util.regex.Pattern; | |
import org.apache.hadoop.io.IntWritable; | |
import org.apache.hadoop.io.LongWritable; | |
import org.apache.hadoop.io.Text; | |
import org.apache.hadoop.mapreduce.Mapper; | |
import org.apache.hadoop.mapreduce.lib.input.FileSplit; | |
public class LogEventCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { | |
String strLogEntryPattern = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)"; | |
public static final int NUM_FIELDS = 6; | |
Text strEvent = new Text(""); | |
@Override | |
public void map(LongWritable key, Text value, Context context) | |
throws IOException, InterruptedException { | |
String strLogEntryLine = value.toString(); | |
Pattern objPtrn = Pattern.compile(strLogEntryPattern); | |
Matcher objPatternMatcher = objPtrn.matcher(strLogEntryLine); | |
if (!objPatternMatcher.matches() || NUM_FIELDS != objPatternMatcher.groupCount()) { | |
System.err.println("Bad log entry (or problem with RE?):"); | |
System.err.println(strLogEntryLine); | |
return; | |
} | |
/* | |
System.out.println("Month_Name: " + objPatternMatcher.group(1)); | |
System.out.println("Day: " + objPatternMatcher.group(2)); | |
System.out.println("Time: " + objPatternMatcher.group(3)); | |
System.out.println("Node: " + objPatternMatcher.group(4)); | |
System.out.println("Process: " + objPatternMatcher.group(5)); | |
System.out.println("LogMessage: " + objPatternMatcher.group(6)); | |
*/ | |
//Oh what a pretty chunk of code ;) | |
strEvent.set(((FileSplit)context.getInputSplit()).getPath().toString().substring((((FileSplit)context.getInputSplit()).getPath().toString().length()-16), (((FileSplit)context.getInputSplit()).getPath().toString().length()-12)) + "-" + ((objPatternMatcher.group(5).toString().indexOf("[")) == -1 ? (objPatternMatcher.group(5).toString().substring(0,(objPatternMatcher.group(5).length()-1))) : (objPatternMatcher.group(5).toString().substring(0,(objPatternMatcher.group(5).toString().indexOf("[")))))); | |
context.write(strEvent, new IntWritable(1)); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Source code for reducer | |
//-------------------------- | |
// LogEventCountReducer.java | |
//-------------------------- | |
package Airawat.Oozie.Samples; | |
import java.io.IOException; | |
import org.apache.hadoop.io.IntWritable; | |
import org.apache.hadoop.io.Text; | |
import org.apache.hadoop.mapreduce.Reducer; | |
public class LogEventCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { | |
@Override | |
public void reduce(Text key, Iterable<IntWritable> values, Context context) | |
throws IOException, InterruptedException { | |
int intEventCount = 0; | |
for (IntWritable value : values) { | |
intEventCount += value.get(); | |
} | |
context.write(key, new IntWritable(intEventCount)); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Source code for reducer | |
//-------------------------- | |
// LogEventCountReducer.java | |
//-------------------------- | |
package Airawat.Oozie.Samples; | |
import org.apache.hadoop.fs.Path; | |
import org.apache.hadoop.io.IntWritable; | |
import org.apache.hadoop.io.Text; | |
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; | |
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; | |
import org.apache.hadoop.mapreduce.Job; | |
public class LogEventCount { | |
public static void main(String[] args) throws Exception { | |
if (args.length != 2) { | |
System.out.printf( | |
"Usage: Airawat.Oozie.Samples.LogEventCount <input dir> <output dir>\n"); | |
System.exit(-1); | |
} | |
//Instantiate a Job object for your job's configuration. | |
Job job = new Job(); | |
//Job jar file | |
job.setJarByClass(LogEventCount.class); | |
//Job name | |
job.setJobName("Syslog Event Rollup"); | |
//Paths | |
FileInputFormat.setInputPaths(job, new Path(args[0])); | |
FileOutputFormat.setOutputPath(job, new Path(args[1])); | |
//Mapper and reducer classes | |
job.setMapperClass(LogEventCountMapper.class); | |
job.setReducerClass(LogEventCountReducer.class); | |
//Job's output key and value classes | |
job.setOutputKeyClass(Text.class); | |
job.setOutputValueClass(IntWritable.class); | |
//Number of reduce tasks | |
job.setNumReduceTasks(3); | |
//Start the MapReduce job, wait for it to finish. | |
boolean success = job.waitForCompletion(true); | |
System.exit(success ? 0 : 1); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Commands to test the java program | |
--------------------------------- | |
a) Command to run the program | |
$ hadoop jar oozieProject/workflowJavaMainAction/lib/LogEventCount.jar Airawat.Oozie.Samples.LogEventCount "oozieProject/data/*/*/*/*/*" "oozieProject/workflowJavaMainAction/myCLIOutput" | |
b) Command to view results | |
$ hadoop fs -cat oozieProject/workflowJavaMainAction/myCLIOutput/part* | sort | |
c) Results | |
2013-NetworkManager 7 | |
22013-console-kit-daemon 7 | |
2013-gnome-session 11 | |
2013-init 166 | |
2013-kernel 810 | |
2013-login 2 | |
2013-NetworkManager 7 | |
2013-nm-dispatcher.action 4 | |
2013-ntpd_initres 4133 | |
2013-polkit-agent-helper-1 8 | |
2013-pulseaudio 18 | |
2013-spice-vdagent 15 | |
2013-sshd 6 | |
2013-sudo 8 | |
2013-udevd 6 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#***************************** | |
# job.properties | |
#***************************** | |
nameNode=hdfs://cdh-nn01.chuntikhadoop.com:8020 | |
jobTracker=cdh-jt01:8021 | |
queueName=default | |
oozie.libpath=${nameNode}/user/oozie/share/lib | |
oozie.use.system.libpath=true | |
oozie.wf.rerun.failnodes=true | |
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject | |
appPath=${oozieProjectRoot}/workflowJavaMainAction | |
oozie.wf.application.path=${appPath} | |
inputDir=${oozieProjectRoot}/data/*/*/*/*/* | |
outputDir=${appPath}/output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--*************************************************--> | |
<!--*******06-workflow.xml***************************--> | |
<!--*************************************************--> | |
<workflow-app name="WorkflowJavaMainAction" xmlns="uri:oozie:workflow:0.1"> | |
<start to="javaMainAction"/> | |
<action name="javaMainAction"> | |
<java> | |
<job-tracker>${jobTracker}</job-tracker> | |
<name-node>${nameNode}</name-node> | |
<prepare> | |
<delete path="${outputDir}"/> | |
</prepare> | |
<configuration> | |
<property> | |
<name>mapred.job.queue.name</name> | |
<value>${queueName}</value> | |
</property> | |
</configuration> | |
<main-class>Airawat.Oozie.Samples.LogEventCount</main-class> | |
<arg>${inputDir}</arg> | |
<arg>${outputDir}</arg> | |
</java> | |
<ok to="end"/> | |
<error to="killJobAction"/> | |
</action> | |
<kill name="killJobAction"> | |
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message> | |
</kill> | |
<end name="end" /> | |
</workflow-app> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
07. Oozie commands | |
------------------- | |
Note: Replace oozie server and port, with your cluster-specific. | |
1) Submit job: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowJavaMainAction/job.properties -submit | |
job: 0000012-130712212133144-oozie-oozi-W | |
2) Run job: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -start 0000014-130712212133144-oozie-oozi-W | |
3) Check the status: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -info 0000014-130712212133144-oozie-oozi-W | |
4) Suspend workflow: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -suspend 0000014-130712212133144-oozie-oozi-W | |
5) Resume workflow: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -resume 0000014-130712212133144-oozie-oozi-W | |
6) Re-run workflow: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowJavaMainAction/job.properties -rerun 0000014-130712212133144-oozie-oozi-W | |
7) Should you need to kill the job: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -kill 0000014-130712212133144-oozie-oozi-W | |
8) View server logs: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -logs 0000014-130712212133144-oozie-oozi-W | |
Logs are available at: | |
/var/log/oozie on the Oozie server. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
08-Workflow application output | |
------------------------------- | |
$ hadoop fs -ls -R oozieProject/workflowJavaMainAction/output/part* | awk '{print $8}' | |
oozieProject/workflowJavaMainAction/output/part-r-00000 | |
$ hadoop fs -cat oozieProject/workflowJavaMainAction/output/part-r-00000 | |
2013-NetworkManager 7 | |
2013-console-kit-daemon 7 | |
2013-gnome-session 11 | |
2013-init 166 | |
2013-kernel 810 | |
2013-login 2 | |
2013-nm-dispatcher.action 4 | |
2013-ntpd_initres 4133 | |
2013-polkit-agent-helper-1 8 | |
2013-pulseaudio 18 | |
2013-spice-vdagent 15 | |
2013-sshd 6 | |
2013-sudo 8 | |
2013-udevd 6 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
09 - Oozie web console - screenshots | |
------------------------------------- | |
Available at- | |
http://hadooped.blogspot.com/2013/06/apache-oozie-part-6-oozie-workflow-with.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment