Last active
January 20, 2024 07:08
-
-
Save airawat/6001165 to your computer and use it in GitHub Desktop.
Oozie workflow application with sqoop action
Pipes data from Hive table to mysql database table
Oozie 3.3.0; Sqoop (1.4.2) with Mysql (5.1.69 )
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a simple workflow application (oozie 3.3.0) that | |
pipes data in a Hive table to mysql; | |
The sample application includes: | |
-------------------------------- | |
1. Oozie actions: sqoop action | |
2. Oozie workflow controls: start, end, and kill. | |
3. Workflow components: job.properties and workflow.xml | |
4. Sample data | |
5. Prep tasks in Hive | |
6. Commands to deploy workflow, submit and run workflow | |
7. Oozie web console - screenshots from sample program execution | |
Pictorial overview of workflow: | |
------------------------------- | |
Available at: | |
http://hadooped.blogspot.com/2013/06/apache-sqoop-part-5-scheduling-sqoop.html | |
Includes: | |
--------- | |
01-WorkflowComponents | |
02a-DataAndCodeDownload | |
02b-DataSampleAndStructure | |
03-HdfsLoadCommands | |
04a-HiveSetupTasks | |
04b-HiveReport-SourceForSqoop | |
05-mySqlSetUp | |
06-SqoopTaskStandAloneTest | |
07-JobPropertiesFile | |
08-WorkflowXMLFile | |
09-Oozie commands | |
10-OutputInMysql | |
11-Oozie web console screenshots |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
01. Workflow Components: | |
------------------------ | |
1. job.properties | |
File containing: | |
a) parameter and value declarations that are referenced in the workflows, and | |
b) environment information referenced by Oozie to run the workflow including name node, job tracker, workflow application path etc | |
2. workflow.xml | |
Workflow definition file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
02a.1. Download location: | |
------------------------- | |
Github- | |
https://github.com/airawat/OozieSamples | |
Email me at [email protected] to contact me if you have access issues. | |
02a.2. Directory structure applicable for this gist: | |
---------------------------------------------------- | |
oozieProject | |
data | |
airawat-syslog | |
<<node>> | |
<<year>> | |
<<month>> | |
messages | |
workflowSqoopAction | |
job.prperties | |
workflow.xml | |
hive-site.xml | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
02b.1. Sample data | |
-------------------- | |
May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal | |
May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1 | |
May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray | |
May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr | |
May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max) | |
May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns | |
May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org | |
02b.2. Structure | |
----------------- | |
Month = May | |
Day = 3 | |
Time = 11:52:54 | |
Node = cdh-dn03 | |
Process = init: | |
Log msg = tty (/dev/tty6) main process (1208) killed by TERM signal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
03. Hdfs commands | |
------------------- | |
1) Load the data and workflow application to hadoop | |
$ hadoop fs -mkdir oozieProject | |
$ hadoop fs -put oozieProject/* oozieProject/ | |
2) Validate load | |
$ hadoop fs -ls -R oozieProject |awk '{print $8}' | |
..should match directory listing in section 2, above. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************************ | |
**04a. Hive setup tasks | |
************************ | |
a) Create table: | |
hive> | |
CREATE EXTERNAL TABLE SysLogEvents( | |
month_name STRING, | |
day STRING, | |
time STRING, | |
host STRING, | |
event STRING, | |
log STRING) | |
PARTITIONED BY(node string,year int, month int) | |
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' | |
WITH SERDEPROPERTIES ( | |
"input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)" | |
) | |
stored as textfile; | |
b) Create and load partitions: | |
Note: Replace my user ID "akhanolk" with yours | |
hive > | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dev01",year=2013, month=04) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dev01/2013/04/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dev01",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dev01/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn01",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn01/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn02",year=2013, month=04) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn02/2013/04/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn02",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn02/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn03",year=2013, month=04) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn03/2013/04/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-dn03",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-dn03/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-jt01",year=2013, month=04) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-jt01/2013/04/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-jt01",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-jt01/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-nn01",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-nn01/2013/05/'; | |
Alter table SysLogEvents Add IF NOT EXISTS partition(node="cdh-vms",year=2013, month=05) | |
location '/user/akhanolk/oozieProject/data/airawat-syslog/cdh-vms/2013/05/'; | |
c) Hive ql to test data loaded: | |
hive> | |
--Print headers | |
set hive.cli.print.header=true; | |
--Need to add this jar for MR to work..your env may not need it | |
add jar hadoop-lib/hive-contrib-0.10.0-cdh4.2.0.jar; | |
--Sample query | |
hive> select * from SysLogEvents limit 2; | |
OK | |
month_name day time host event log node year month | |
Apr 23 16:14:10 cdh-dev01 spice-vdagent[5657]: Missing virtio device '/dev/virtio-ports/com.redhat.spice.0': No such file or directory cdh-dev01 2013 04 | |
Apr 23 16:14:12 cdh-dev01 pulseaudio[5705]: pid.c: Daemon already running. cdh-dev01 2013 04 | |
Time taken: 13.241 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
********************************* | |
**04b-HiveReport-SourceForSqoop | |
********************************* | |
--Run the following in Hive; It creates and populates the source table for the sqoop action; | |
use default; | |
drop table if exists eventsgranularreport; | |
CREATE TABLE IF NOT EXISTS eventsgranularreport( | |
year int, | |
month int, | |
day int, | |
event STRING, | |
occurrence int) | |
ROW FORMAT DELIMITED | |
FIELDS TERMINATED by ',' | |
LINES TERMINATED by '\n'; | |
INSERT OVERWRITE TABLE eventsgranularreport | |
select Year,Month,Day,Event,Count(*) Occurrence from SysLogEvents group by year,month,day,event order by event asc,year,month,day desc; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
**************************** | |
**Prep tasks - mysql | |
**************************** | |
mysql> create database airawat; | |
mysql> use airawat; | |
mysql> CREATE TABLE IF NOT EXISTS eventsgranularreport( | |
year INTEGER, | |
month INTEGER, | |
day INTEGER, | |
event VARCHAR(100), | |
occurrence INTEGER); | |
mysql> create user 'devUser'@'machineName' identified by 'myPwd'; | |
GRANT ALL PRIVILEGES ON airawat.* TO 'devUser'@'machineName' WITH GRANT OPTION; | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************************************** | |
**Sqoop command - standlone test | |
************************************** | |
a) Command run on Sqoop client: | |
sqoop export \ | |
--connect jdbc:mysql://cdh-dev01/airawat \ | |
--username devUser \ | |
--password myPwd \ | |
--table eventsgranularreport \ | |
--direct \ | |
--enclosed-by '\"' \ | |
--export-dir /user/hive/warehouse/eventsgranularreport | |
b) Output in mysql: | |
mysql> | |
select * from eventsgranularreport; | |
mysql> select * from eventsgranularreport; | |
+------+-------+------+------------------------------+------------+ | |
| year | month | day | event | occurrence | | |
+------+-------+------+------------------------------+------------+ | |
| 2013 | 5 | NULL | NULL | 25 | | |
| 2013 | 5 | 3 | NetworkManager[1232]: | 1 | | |
| 2013 | 5 | 7 | NetworkManager[1243]: | 1 | | |
| 2013 | 5 | 7 | NetworkManager[1284]: | 1 | | |
..... | |
| 2013 | 5 | 7 | pulseaudio[2074]: | 1 | | |
| 2013 | 5 | 7 | pulseaudio[2076]: | 1 | | |
| 2013 | 5 | 7 | pulseaudio[2106]: | 1 | | |
| 2013 | 5 | 7 | pulseaudio[2116]: | 1 | | |
| 2013 | 5 | 7 | pulseaudio[2135]: | 1 | | |
+------+-------+------+------------------------------+------------+ | |
104 rows in set (0.08 sec) | |
mysql> delete from eventsgranularreport; | |
Query OK, 104 rows affected (0.02 sec) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#***************************** | |
# 07: job.properties | |
#***************************** | |
nameNode=hdfs://cdh-nn01.chuntikhadoop.com:8020 | |
jobTracker=cdh-jt01:8021 | |
queueName=default | |
oozie.libpath=${nameNode}/user/oozie/share/lib | |
oozie.use.system.libpath=true | |
oozie.wf.rerun.failnodes=true | |
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject | |
appPath=${oozieProjectRoot}/workflowSqoopAction | |
oozie.wf.application.path=${appPath} | |
emailToAddress=akhanolk@cdh-dev01 | |
#**********end**************************** | |
- Note: The way the name node and job tracker information is specified in job.properties should match the oozie-site.xml properties for oozie.service.HadoopAccessorService.nameNode.whitelist and oozie.service.HadoopAccessorService.jobTracker.whitelist respectively. | |
Read up on oozie share lib and install if you dont already have it. | |
http://blog.cloudera.com/blog/2012/12/how-to-use-the-sharelib-in-apache-oozie/ | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--******************************************--> | |
<!--08. workflow.xml --> | |
<!--******************************************--> | |
<workflow-app name="WorkflowWithSqoopAction" xmlns="uri:oozie:workflow:0.1"> | |
<start to="sqoopAction"/> | |
<action name="sqoopAction"> | |
<sqoop xmlns="uri:oozie:sqoop-action:0.2"> | |
<job-tracker>${jobTracker}</job-tracker> | |
<name-node>${nameNode}</name-node> | |
<command>export --connect jdbc:mysql://cdh-dev01/airawat --username devUser --password myPwd --table eventsgranularreport --direct --enclosed-by " --export-dir /user/hive/warehouse/eventsgranularreport</command> | |
</sqoop> | |
<ok to="end"/> | |
<error to="killJob"/> | |
</action> | |
<kill name="killJob"> | |
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message> | |
</kill> | |
<end name="end" /> | |
</workflow-app> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
09. Oozie commands | |
------------------- | |
Note: Replace oozie server and port, with your cluster-specific. | |
1) Submit job: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowSqoopAction/job.properties -submit | |
job: 0000012-130712212133144-oozie-oozi-W | |
2) Run job: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -start 0000012-130712212133144-oozie-oozi-W | |
3) Check the status: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -info 0000012-130712212133144-oozie-oozi-W | |
4) Suspend workflow: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -suspend 0000012-130712212133144-oozie-oozi-W | |
5) Resume workflow: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -resume 0000012-130712212133144-oozie-oozi-W | |
6) Re-run workflow: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowSqoopAction/job.properties -rerun 0000012-130712212133144-oozie-oozi-W | |
7) Should you need to kill the job: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -kill 0000012-130712212133144-oozie-oozi-W | |
8) View server logs: | |
$ oozie job -oozie http://cdh-dev01:11000/oozie -logs 0000012-130712212133144-oozie-oozi-W | |
Logs are available at: | |
/var/log/oozie on the Oozie server. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
******************* | |
***Output in mysql | |
******************* | |
mysql> | |
select * from eventsgranularreport; | |
mysql> select * from eventsgranularreport; | |
+------+-------+------+------------------------------+------------+ | |
| year | month | day | event | occurrence | | |
+------+-------+------+------------------------------+------------+ | |
| 2013 | 5 | NULL | NULL | 25 | | |
| 2013 | 5 | 3 | NetworkManager[1232]: | 1 | | |
| 2013 | 5 | 7 | NetworkManager[1243]: | 1 | | |
| 2013 | 5 | 7 | NetworkManager[1284]: | 1 | | |
..... | |
| 2013 | 5 | 7 | pulseaudio[2074]: | 1 | | |
| 2013 | 5 | 7 | pulseaudio[2076]: | 1 | | |
| 2013 | 5 | 7 | pulseaudio[2106]: | 1 | | |
| 2013 | 5 | 7 | pulseaudio[2116]: | 1 | | |
| 2013 | 5 | 7 | pulseaudio[2135]: | 1 | | |
+------+-------+------+------------------------------+------------+ | |
104 rows in set (0.08 sec) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
11. Available at: | |
----------------- | |
http://hadooped.blogspot.com/2013/06/apache-sqoop-part-5-scheduling-sqoop.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment