This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a mapper and reducer in python that can parse log files using | |
regex; Usecase: Count the number of occurances of processes that got logged by month. | |
Includes: | |
--------- | |
Sample data | |
Review of log data structure | |
Sample data and scripts for download | |
Mapper | |
Reducer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a mapper, reducer and driver in java that can parse log files using | |
regex; The code for combiner is the same as reducer; | |
Usecase: Count the number of occurances of processes that got logged, inception to date. | |
Includes: | |
--------- | |
Sample data and scripts for download:01-ScriptAndDataDownload | |
Sample data and structure: 02-SampleDataAndStructure | |
Mapper: 03-LogEventCountMapper.java | |
Reducer: 04-LogEventCountReducer.java |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes hive ql scripts to create an external partitioned table for Syslog | |
generated log files using regex serde; | |
Usecase: Count the number of occurances of processes that got logged, by year, month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data download: 02-DataDownload | |
Data load commands: 03-DataLoadCommands |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data and script download: 02-DataAndScriptDownload | |
Data load commands: 03-HdfsLoadCommands | |
Pig script: 04-PigLatinScript |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files through a | |
java mapreduce program that uses regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Related gist that covers the java code - https://gist.github.com/airawat/5915374 | |
Pig version: version 0.10.0 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes oozie workflow components to run a pig latin script to parse | |
(Syslog generated) log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Pictorial overview of workflow: | |
------------------------------- | |
http://hadooped.blogspot.com/2013/07/apache-oozie-part-7-oozie-workflow-with_3.html | |
Includes: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes oozie workflow components (streaming map reduce action) to execute | |
python mapper and reducer scripts to parse Syslog generated log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, and process. | |
Pictorial overview of workflow: | |
-------------------------------- | |
http://hadooped.blogspot.com/2013/07/apache-oozie-part-5-oozie-workflow-with.html | |
Includes: | |
--------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data | |
and commands; Oozie actions covered: hdfs action, email action, java main action, | |
hive action; Oozie controls covered: decision, fork-join; The workflow includes a | |
sub-workflow that runs two hive actions concurrently. The hive table is partitioned; | |
Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input | |
directory path and includes part of it in the key. | |
Usecase: Parse Syslog generated log files to generate reports; | |
Pictorial overview of job: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie (trigger file initiated) coordinator job - | |
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action, | |
java main action, hive action; Oozie controls covered: decision, fork-join; The workflow | |
includes a sub-workflow that runs two hive actions concurrently. The hive table is | |
partitioned; Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets | |
the input directory path and includes part of it in the key. | |
Usecase | |
------- | |
Parse Syslog generated log files to generate reports; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie, dataset availability initiated, coordinator job - | |
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action, | |
sqoop action (mysql database); Oozie controls covered: decision; | |
Usecase | |
------- | |
Pipe report data available in HDFS, to mysql database; | |
Pictorial overview of job: | |
-------------------------- |
OlderNewer