This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Introduction | |
| ------------- | |
| This gist includes sample data, application components, and components to execute a bundle application. | |
| The sample bundle application is time triggered. The start time is defined in the bundle job.properties | |
| file. The bundle application starts two coordinator applications- as defined in the bundle definition file - | |
| bundleConfirguration.xml. | |
| The first coordinator job is time triggered. The start time is defined in the bundle job.properties file. | |
| It runs a workflow, that includes a java main action. The java program parses some log files and generates |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This gist includes components of a oozie workflow - scripts/code, sample data | |
| and commands; Oozie actions covered: java main action; Oozie controls | |
| covered: start, kill, end; The java program uses regex to parse the logs, and | |
| also extracts pat of the mapper input directory path and includes in the key | |
| emitted. | |
| Usecase | |
| ------- | |
| Parse Syslog generated log files to generate reports; | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This gist includes components of a oozie workflow - scripts/code, sample data | |
| and commands; Oozie actions covered: java mapreduce action; Oozie controls | |
| covered: start, kill, end; The java program uses regex to parse the logs, and | |
| also extracts the path of the mapper input directory path and includes in the | |
| key emitted. | |
| Note: The reducer can be specified as a combiner as well. | |
| Usecase | |
| ------- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This gist includes components of a simple workflow application (oozie 3.3.0) that | |
| pipes data in a Hive table to mysql; | |
| The sample application includes: | |
| -------------------------------- | |
| 1. Oozie actions: sqoop action | |
| 2. Oozie workflow controls: start, end, and kill. | |
| 3. Workflow components: job.properties and workflow.xml | |
| 4. Sample data | |
| 5. Prep tasks in Hive |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This gist includes components of a simple workflow application that created a directory and moves files within | |
| hdfs to this directory; | |
| Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section, | |
| to allow re-run of the action..the prepare essentially negates the move done by a potential prior run | |
| of the action. Sample data is also included. | |
| The sample application includes: | |
| -------------------------------- | |
| 1. Oozie actions: hdfs action and email action | |
| 2. Oozie workflow controls: start, end, and kill. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This gist includes components of a oozie, dataset availability initiated, coordinator job - | |
| scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action, | |
| sqoop action (mysql database); Oozie controls covered: decision; | |
| Usecase | |
| ------- | |
| Pipe report data available in HDFS, to mysql database; | |
| Pictorial overview of job: | |
| -------------------------- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This gist includes components of a oozie (trigger file initiated) coordinator job - | |
| scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action, | |
| java main action, hive action; Oozie controls covered: decision, fork-join; The workflow | |
| includes a sub-workflow that runs two hive actions concurrently. The hive table is | |
| partitioned; Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets | |
| the input directory path and includes part of it in the key. | |
| Usecase | |
| ------- | |
| Parse Syslog generated log files to generate reports; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data | |
| and commands; Oozie actions covered: hdfs action, email action, java main action, | |
| hive action; Oozie controls covered: decision, fork-join; The workflow includes a | |
| sub-workflow that runs two hive actions concurrently. The hive table is partitioned; | |
| Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input | |
| directory path and includes part of it in the key. | |
| Usecase: Parse Syslog generated log files to generate reports; | |
| Pictorial overview of job: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This gist includes oozie workflow components (streaming map reduce action) to execute | |
| python mapper and reducer scripts to parse Syslog generated log files using regex; | |
| Usecase: Count the number of occurances of processes that got logged, by month, and process. | |
| Pictorial overview of workflow: | |
| -------------------------------- | |
| http://hadooped.blogspot.com/2013/07/apache-oozie-part-5-oozie-workflow-with.html | |
| Includes: | |
| --------- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| About this gist: | |
| ================ | |
| This gist is a part of a series of log parsers in Java Mapreduce, Pig, Hive, Python... | |
| This one covers a log parser in Cascading. | |
| It reads syslogs in HDFS - | |
| a) Parses them based on a regex pattern & writes parsed files to HDFS | |
| b) Writes records that dont match pattern to HDFS | |
| c) Writes a report to HDFS that contains the count of distinct processes logged. | |
| Other gists/blogs: |