-
-
Save v5tech/0900d683e423382506a4 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This gist includes a pig latin script to parse Syslog generated log files through a | |
| java mapreduce program that uses regex; | |
| Usecase: Count the number of occurances of processes that got logged, by month, | |
| day and process. | |
| Related gist that covers the java code - https://gist.github.com/airawat/5915374 | |
| Pig version: version 0.10.0 | |
| Includes: | |
| --------- | |
| Sample data and structure: 01-SampleDataAndStructure | |
| Data and script download: 02-DataAndScriptDownload | |
| Data load commands: 03-HdfsLoadCommands | |
| Pig script: 04-PigLatinScript | |
| Pig script execution command: 05-PigLatinScriptExecution | |
| Output: 06-Output |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Sample data | |
| ------------ | |
| May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal | |
| May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1 | |
| May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray | |
| May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr | |
| May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max) | |
| May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns | |
| May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org | |
| Structure | |
| ---------- | |
| Month = May | |
| Day = 3 | |
| Time = 11:52:54 | |
| Node = cdh-dn03 | |
| Process = init: | |
| Log msg = tty (/dev/tty6) main process (1208) killed by TERM signal |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Data download | |
| ------------- | |
| https://groups.google.com/forum/?hl=en#!topic/hadooped/DMQVIwBUQOo | |
| Directory structure | |
| ------------------- | |
| LogParserSamplePigMR | |
| Data | |
| airawat-syslog | |
| 2013 | |
| 04 | |
| messages | |
| 2013 | |
| 05 | |
| messages | |
| lib | |
| LogEventCount.jar | |
| SysLog-PigMR-Report.pig |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Commands to load to HDFS [03-HdfsLoadCommands] | |
| ---------------------------------------------- | |
| $ hadoop fs -put LogParserSamplePigMR | |
| $ hadoop fs -ls -R LogParserSamplePigMR | awk '{print $8}' | |
| LogParserSamplePigMR/Data | |
| LogParserSamplePigMR/Data/airawat-syslog | |
| LogParserSamplePigMR/Data/airawat-syslog/2013 | |
| LogParserSamplePigMR/Data/airawat-syslog/2013/04 | |
| LogParserSamplePigMR/Data/airawat-syslog/2013/04/messages | |
| LogParserSamplePigMR/Data/airawat-syslog/2013/05 | |
| LogParserSamplePigMR/Data/airawat-syslog/2013/05/messages | |
| LogParserSamplePigMR/SysLog-PigMR-Report.pig | |
| LogParserSamplePigMR/lib | |
| LogParserSamplePigMR/lib/LogEventCount.jar | |
| ParserSamplePigMR/reportDir/_logs/history/job_201306261042_0054_1372873417824_akhanolk_PigLatin%3ASysLog-PigMR-Report.pig | |
| LogParserSamplePigMR/reportDir/part-m-00000 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /*----------------------------------------*/ | |
| /*PigLatinScript - SysLog-PigMR-Report.pig*/ | |
| /*----------------------------------------*/ | |
| rmf LogParserSamplePigMR/outputDir | |
| rmf LogParserSamplePigMR/inputDir | |
| rmf LogParserSamplePigMR/reportDir | |
| raw_log_DS = | |
| LOAD 'LogParserSamplePigMR/Data/airawat-syslog/*/*/*' as line; | |
| report_DS = MAPREDUCE 'lib/LogEventCount.jar' STORE raw_log_DS INTO 'LogParserSamplePigMR/inputDir' LOAD 'LogParserSamplePigMR/outputDir' AS (process:chararray, count: int) `Airawat.O | |
| ozie.Samples.LogEventCount LogParserSamplePigMR/inputDir LogParserSamplePigMR/outputDir`; | |
| store report_DS INTO 'LogParserSamplePigMR/reportDir'; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Command to run the pig script | |
| ------------------------------ | |
| These should be run after the data, scripts and jars are loaded to HDFS - covered in section 03-HdfsLoadCommands | |
| $ cd LogParserSamplePigMR | |
| $ pig SysLog-PigMR-Report.pig | |
| Command to view output | |
| ----------------------- | |
| $ hadoop fs -cat LogParserSamplePigMR/reportDir/part* |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Output | |
| ------- | |
| init: 23 | |
| kernel: 58 | |
| ntpd_initres[1705]: 792 | |
| sudo: 2 | |
| udevd[361]: 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment