Last active
May 21, 2021 16:09
-
-
Save airawat/5970026 to your computer and use it in GitHub Desktop.
Sample Oozie coordinator job that executes upon availability of a specified dataset.
Includes scripts/code, sample data, commands.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie, dataset availability initiated, coordinator job - | |
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action, | |
sqoop action (mysql database); Oozie controls covered: decision; | |
Usecase | |
------- | |
Pipe report data available in HDFS, to mysql database; | |
Pictorial overview of job: | |
-------------------------- | |
http://hadooped.blogspot.com/p/ooziecooridnatorjobdatasetdep-pix.html | |
Includes: | |
--------- | |
Data and script download: 02-DataAndScriptDownload | |
Data load commands: 03-HdfsLoadCommands | |
Mysql database setup: 04-mysqlDBSetup | |
Sqoop command - test: 05-SqoopStandaloneTryout | |
Oozie configuration for email 06-OozieSMTPconfiguration | |
Oozie coorindator properties file 07-OozieCoordinatorProperties | |
Oozie cooridinator conf file 08-OozieCoordinatorXML | |
Sqoop workflow conf file 09-SqoopWorkflowXML | |
Oozie commands 10-OozieJobExecutionCommands | |
Output in mysql 11-Output | |
Oozie web console - screenshots 12-OozieWebConsoleScreenshots |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
********************************* | |
Data download | |
********************************* | |
Github: | |
https://github.com/airawat/OozieSamples | |
Email me at [email protected] if you encounter any issues. | |
********************************* | |
Directory structure | |
********************************* | |
oozieProject | |
sampleCoordinatorJobDatasetDep | |
coordinatorConf/ | |
coordinator.properties | |
coordinator.xml | |
sqoopWorkflowApp | |
workflow.xml | |
datasetGeneratorApp | |
outputDir | |
part-r-00000 | |
_SUCCESS | |
_logs | |
history | |
cdh-jt01_1372261353326_job_201306261042_0536_conf.xml | |
job_201306261042_0536_1373407670448_akhanolk_Syslog+Event+Rollup | |
--------------------------------------------------------------------------------------- | |
Line 14 - 20 | |
------------- | |
Cordinator application | |
Line 21 -30 | |
------------ | |
The datasetGeneratorApp is essentially what we will use to trigger the coordinator job. While the logs are not important for the simulation, the presence of _SUCCESS is needed, failing which the job will not get triggered. | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
********************************* | |
Hdfs data load commands | |
********************************* | |
$ hadoop fs -mkdir oozieProject | |
$ hadoop fs -put oozieProject/* oozieProject/ | |
Run command below to validate load against expected directory structure in section 02-DataAndScriptDownload | |
$ hadoop fs -ls -R oozieProject/sampleCoordinatorJobDatasetDep | awk '{print $8}' | |
Remove the dataset directory - we will load it when we want to trigger the job | |
$ hadoop fs -rm -R oozieProject/datasetGeneratorApp | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
********************************* | |
Mysql database setup tasks | |
********************************* | |
a) Create database: | |
mysql> | |
create database airawat; | |
b) Switch to database created: | |
mysql> | |
use airawat; | |
c) Create destination table for sqoop export from hdfs: | |
mysql> | |
CREATE TABLE IF NOT EXISTS Logged_Process_Count_By_Year( | |
year_and_process varchar(100), | |
occurrence INTEGER); | |
d) Ensure your sqoop user has access to database created: | |
mysql> | |
grant all on airawat.* to myUser@'myMachine'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************************************************************* | |
Sqoop command - try out the command to see if it works | |
************************************************************* | |
Pre-requisties: | |
1. Dataset to be exported should exist on HDFS | |
2. mySql table that is the destination for the export should exist | |
Command: | |
--Run on node that acts as sqoop client; | |
$ sqoop export \ | |
--connect jdbc:mysql://cdh-dev01/airawat \ | |
--username devUser \ | |
--password myPwd \ | |
--table Logged_Process_Count_By_Year \ | |
--direct \ | |
--export-dir "oozieProject/datasetGeneratorApp/outputDir" \ | |
--fields-terminated-by "\t" | |
********************************* | |
Results in mysql | |
********************************* | |
mysql> select * from Logged_Process_Count_By_Year order by occurrence desc; | |
+----------------------------+------------+ | |
| year_and_process | occurrence | | |
+----------------------------+------------+ | |
| 2013-ntpd_initres | 4133 | | |
| 2013-kernel | 810 | | |
| 2013-init | 166 | | |
| 2013-pulseaudio | 18 | | |
| 2013-spice-vdagent | 15 | | |
| 2013-gnome-session | 11 | | |
| 2013-sudo | 8 | | |
| 2013-polkit-agent-helper-1 | 8 | | |
| 2013-console-kit-daemon | 7 | | |
| 2013-NetworkManager | 7 | | |
| 2013-udevd | 6 | | |
| 2013-sshd | 6 | | |
| 2013-nm-dispatcher.action | 4 | | |
| 2013-login | 2 | | |
+----------------------------+------------+ | |
14 rows in set (0.00 sec) | |
************************************************* | |
--Cleanup | |
************************************************* | |
mysql> | |
delete from Logged_Process_Count_By_Year; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************************* | |
Oozie SMTP configuration | |
************************* | |
The following needs to be added to oozie-site.xml - after updating per your environment and configuration; | |
<!-- SMTP params--> | |
<property> | |
<name>oozie.email.smtp.host</name> | |
<value>cdh-dev01</value> | |
</property> | |
<property> | |
<name>oozie.email.smtp.port</name> | |
<value>25</value> | |
</property> | |
<property> | |
<name>oozie.email.from.address</name> | |
<value>oozie@cdh-dev01</value> | |
</property> | |
<property> | |
<name>oozie.email.smtp.auth</name> | |
<value>false</value> | |
</property> | |
<property> | |
<name>oozie.email.smtp.username</name> | |
<value></value> | |
</property> | |
<property> | |
<name>oozie.email.smtp.password</name> | |
<value></value> | |
</property |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
*************************************************************************** | |
Oozie coordinator properties file: coordinator.properties | |
*************************************************************************** | |
nameNode=hdfs://cdh-nn01.hadoop.com:8020 | |
jobTracker=cdh-jt01:8021 | |
queueName=default | |
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject | |
appRoot=${oozieProjectRoot}/sampleCoordinatorJobDatasetDep | |
oozie.coord.application.path=${appRoot}/coordinatorConf | |
sqoopWorkflowAppPath=${appRoot}/sqoopWorkflowApp | |
oozieLibPath=${nameNode}/user/oozie/share/lib | |
oozie.libpath=${oozieLibPath} | |
oozie.use.system.libpath=true | |
oozie.wf.rerun.failnodes=true | |
triggerDatasetDir=${oozieProjectRoot}/datasetGeneratorApp/outputDir | |
triggerDataFiles=${triggerDatasetDir}/part* | |
mysqlServer=cdh-dev01 | |
mysqlServerDB=airawat | |
mysqlServerDBUID=myUID | |
mysqlServerDBPwd=myPWD | |
toEmailAddress=akhanolk@cdh-dev01 | |
startTime=2013-07-11T00:00Z | |
endTime=2013-07-15T00:00Z | |
timeZoneDef=UTC | |
sqoopInputRecordCount=`cat ${triggerDataFiles} | wc -l` | |
minRequiredRecordCount=1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--*************************************************************************** | |
Oozie coordinator xml file: coordinator.xml | |
*****************************************************************************--> | |
<coordinator-app name="AirawatCoordJobDataTrig" | |
frequency="${coord:days(1)}" | |
start="${startTime}" | |
end="${endTime}" | |
timezone="${timeZoneDef}" | |
xmlns="uri:oozie:coordinator:0.1" | |
xmlns:sla="uri:oozie:sla:0.1"> | |
<controls> | |
<timeout>20</timeout> | |
<concurrency>6</concurrency> | |
<execution>FIFO</execution> | |
</controls> | |
<datasets> | |
<dataset name="inputDS" frequency="${coord:days(1)}" initial-instance="${startTime}" timezone="${timeZoneDef}"> | |
<uri-template>${triggerDatasetDir}</uri-template> | |
</dataset> | |
</datasets> | |
<input-events> | |
<data-in name="AirawatCoordTrigDepInput" dataset="inputDS"> | |
<instance>${startTime}</instance> | |
</data-in> | |
</input-events> | |
<action> | |
<workflow> | |
<app-path>${sqoopWorkflowAppPath}</app-path> | |
</workflow> | |
</action> | |
</coordinator-app> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--*************************************************************************** | |
Oozie workflow xml file: workflow.xml | |
*****************************************************************************--> | |
<workflow-app name="AirawatSampleCoordJobDSDep" xmlns="uri:oozie:workflow:0.1"> | |
<start to="inputAvailableCheckDecision"/> | |
<decision name="inputAvailableCheckDecision"> | |
<switch> | |
<case to="sqoopAction"> | |
${sqoopInputRecordCount gt minRequiredRecordCount} | |
</case> | |
<default to="end"/> | |
</switch> | |
</decision> | |
<action name="sqoopAction"> | |
<sqoop xmlns="uri:oozie:sqoop-action:0.2"> | |
<job-tracker>${jobTracker}</job-tracker> | |
<name-node>${nameNode}</name-node> | |
<configuration> | |
<property> | |
<name>oozie.libpath</name> | |
<value>${oozieLibPath}</value> | |
</property> | |
</configuration> | |
<command>export --connect jdbc:mysql://${mysqlServer}/${mysqlServerDB} --username ${mysqlServerDBUID} --password ${mysqlServerDBPwd} --table Logged_Process_Count_B | |
y_Year --direct --export-dir ${triggerDatasetDir} --fields-terminated-by "\t"</command> | |
</sqoop> | |
<ok to="end"/> | |
<error to="sendErrorEmail"/> | |
</action> | |
<action name="sendErrorEmail"> | |
<email xmlns="uri:oozie:email-action:0.1"> | |
<to>${toEmailAddress}</to> | |
<subject>Status of workflow ${wf:id()}</subject> | |
<body>The workflow ${wf:name()} with id -${wf:id()}, had issues and will be killed; The error logged is: ${wf:errorMessage(wf:lastErrorNode());}</body> | |
</email> | |
<ok to="killJobAction"/> | |
<error to="killJobAction"/> | |
</action> | |
<kill name="killJobAction"> | |
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message> | |
</kill> | |
<end name="end" /> | |
</workflow-app> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
**************************************** | |
Oozie job commands | |
**************************************** | |
a) Prep | |
Modify the start-end time of the job in the coordinator properties file, as needed. | |
Then run the following command. | |
b) Submit job | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/sampleCoordinatorJobDatasetDep/coordinatorConf/coordinator.properties -run | |
A job ID is displayed. | |
The job shuld not trigger till the dataset is loaded. | |
It should be in waiting state - see oozie web console screenshots in the last section. | |
If you need to kill the job... | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -kill <<Job ID>> | |
c) Publish trigger | |
Ideally, this would be generated after some map reduce job completed. | |
For simplicity, I have provided the output of one of the jobs from one of my blogs/gists. | |
$ hadoop fs -put oozieProject/datasetGeneratorApp/ oozieProject/ | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
**************************************** | |
Output - data export from hdfs to mysql | |
**************************************** | |
mysql> select * from Logged_Process_Count_By_Year order by occurrence desc; | |
+----------------------------+------------+ | |
| year_and_process | occurrence | | |
+----------------------------+------------+ | |
| 2013-ntpd_initres | 4133 | | |
| 2013-kernel | 810 | | |
| 2013-init | 166 | | |
| 2013-pulseaudio | 18 | | |
| 2013-spice-vdagent | 15 | | |
| 2013-gnome-session | 11 | | |
| 2013-sudo | 8 | | |
| 2013-polkit-agent-helper-1 | 8 | | |
| 2013-console-kit-daemon | 7 | | |
| 2013-NetworkManager | 7 | | |
| 2013-udevd | 6 | | |
| 2013-sshd | 6 | | |
| 2013-nm-dispatcher.action | 4 | | |
| 2013-login | 2 | | |
+----------------------------+------------+ | |
14 rows in set (0.00 sec) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
http://hadooped.blogspot.com/p/ooziecooridnatorjobtimedep-pix_10.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment