This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie workflow - scripts/code, sample data | |
and commands; Oozie actions covered: java mapreduce action; Oozie controls | |
covered: start, kill, end; The java program uses regex to parse the logs, and | |
also extracts the path of the mapper input directory path and includes in the | |
key emitted. | |
Note: The reducer can be specified as a combiner as well. | |
Usecase | |
------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a simple workflow application (oozie 3.3.0) that | |
pipes data in a Hive table to mysql; | |
The sample application includes: | |
-------------------------------- | |
1. Oozie actions: sqoop action | |
2. Oozie workflow controls: start, end, and kill. | |
3. Workflow components: job.properties and workflow.xml | |
4. Sample data | |
5. Prep tasks in Hive |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a simple workflow application that created a directory and moves files within | |
hdfs to this directory; | |
Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section, | |
to allow re-run of the action..the prepare essentially negates the move done by a potential prior run | |
of the action. Sample data is also included. | |
The sample application includes: | |
-------------------------------- | |
1. Oozie actions: hdfs action and email action | |
2. Oozie workflow controls: start, end, and kill. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie, dataset availability initiated, coordinator job - | |
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action, | |
sqoop action (mysql database); Oozie controls covered: decision; | |
Usecase | |
------- | |
Pipe report data available in HDFS, to mysql database; | |
Pictorial overview of job: | |
-------------------------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie (trigger file initiated) coordinator job - | |
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action, | |
java main action, hive action; Oozie controls covered: decision, fork-join; The workflow | |
includes a sub-workflow that runs two hive actions concurrently. The hive table is | |
partitioned; Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets | |
the input directory path and includes part of it in the key. | |
Usecase | |
------- | |
Parse Syslog generated log files to generate reports; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data | |
and commands; Oozie actions covered: hdfs action, email action, java main action, | |
hive action; Oozie controls covered: decision, fork-join; The workflow includes a | |
sub-workflow that runs two hive actions concurrently. The hive table is partitioned; | |
Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input | |
directory path and includes part of it in the key. | |
Usecase: Parse Syslog generated log files to generate reports; | |
Pictorial overview of job: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes oozie workflow components (streaming map reduce action) to execute | |
python mapper and reducer scripts to parse Syslog generated log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, and process. | |
Pictorial overview of workflow: | |
-------------------------------- | |
http://hadooped.blogspot.com/2013/07/apache-oozie-part-5-oozie-workflow-with.html | |
Includes: | |
--------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes oozie workflow components to run a pig latin script to parse | |
(Syslog generated) log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Pictorial overview of workflow: | |
------------------------------- | |
http://hadooped.blogspot.com/2013/07/apache-oozie-part-7-oozie-workflow-with_3.html | |
Includes: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files through a | |
java mapreduce program that uses regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Related gist that covers the java code - https://gist.github.com/airawat/5915374 | |
Pig version: version 0.10.0 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data and script download: 02-DataAndScriptDownload | |
Data load commands: 03-HdfsLoadCommands | |
Pig script: 04-PigLatinScript |