Skip to content

Instantly share code, notes, and snippets.

Revisions

  1. @airawat airawat revised this gist Nov 22, 2013. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 02-DataAndCodeDownload
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    Download location:
    ------------------
    <<GitHub - to be added>>
    GitHub - https://github.com/airawat/OozieSamples

    Email me at airawat.blog@gmail.com to contact me if you have access issues.

  2. @airawat airawat revised this gist Nov 22, 2013. 1 changed file with 4 additions and 10 deletions.
    14 changes: 4 additions & 10 deletions 02-DataAndCodeDownload
    Original file line number Diff line number Diff line change
    @@ -1,12 +1,11 @@
    Download location:
    ------------------
    Google group-
    https://groups.google.com/forum/?hl=en#!topic/hadooped/HhaBiOmW078
    <<GitHub - to be added>>

    Email me at airawat.blog@gmail.com to contact me if you have access issues.

    Directory structure:
    --------------------
    Directory structure applicable for this post/gist/blog:
    -------------------------------------------------------

    oozieProject
    logs
    @@ -15,12 +14,7 @@ oozieProject
    <<year>>
    <<month>>
    messages
    data
    airawat-syslog
    <<node>>
    <<year>>
    <<month>>
    messages

    workflowHdfsAndEmailActions
    job.prperties
    workflow.xml
  3. @airawat airawat revised this gist Nov 22, 2013. 4 changed files with 52 additions and 42 deletions.
    8 changes: 7 additions & 1 deletion 02-DataAndCodeDownload
    Original file line number Diff line number Diff line change
    @@ -9,7 +9,13 @@ Directory structure:
    --------------------

    oozieProject
    data
    logs
    airawat-syslog
    <<node>>
    <<year>>
    <<month>>
    messages
    data
    airawat-syslog
    <<node>>
    <<year>>
    4 changes: 2 additions & 2 deletions 04-JobPropertiesFile
    Original file line number Diff line number Diff line change
    @@ -14,8 +14,8 @@ oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject
    oozie.wf.application.path=${oozieProjectRoot}/workflowHdfsAndEmailActions

    dataInputDirectoryAbsPath=${oozieProjectRoot}/logs/airawat-syslog
    makeDirectoryAbsPath=${oozieProjectRoot}/data
    dataDestinationDirectoryRelativePath=oozieProject/data
    makeDirectoryAbsPath=${oozieProjectRoot}/dataDump
    dataDestinationDirectoryRelativePath=oozieProject/dataDump

    emailToAddress=akhanolk@cdh-dev01

    6 changes: 5 additions & 1 deletion 06-HdfsLoadCommands
    Original file line number Diff line number Diff line change
    @@ -13,4 +13,8 @@ You should see...

    oozieProject/logs/airawat-syslog/<<node>>/<<year>>/<<month>>/messages
    oozieProject/workflowHdfsAndEmailActions/job.properties
    oozieProject/workflowHdfsAndEmailActions/workflow.xml
    oozieProject/workflowHdfsAndEmailActions/workflow.xml



    $ hadoop fs -rm -R oozieProject/data
    76 changes: 38 additions & 38 deletions 08-Program output
    Original file line number Diff line number Diff line change
    @@ -2,49 +2,49 @@ Program output:
    ---------------

    Expected result:
    1) The data in the logs directory should be in the directory by name data under oozieProject directory.
    1) The data in the logs directory should be in the directory by name dataDump under oozieProject directory.
    2) The directory 'logs' should be deleted.
    3) An email indicating success/failure of the application

    1)
    $ hadoop fs -ls -R oozieProject | awk '{print $8}'

    oozieProject/data/airawat-syslog
    oozieProject/data/airawat-syslog/cdh-dev01
    oozieProject/data/airawat-syslog/cdh-dev01/2013
    oozieProject/data/airawat-syslog/cdh-dev01/2013/04
    oozieProject/data/airawat-syslog/cdh-dev01/2013/04/messages
    oozieProject/data/airawat-syslog/cdh-dev01/2013/05
    oozieProject/data/airawat-syslog/cdh-dev01/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-dn01
    oozieProject/data/airawat-syslog/cdh-dn01/2013
    oozieProject/data/airawat-syslog/cdh-dn01/2013/05
    oozieProject/data/airawat-syslog/cdh-dn01/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-dn02
    oozieProject/data/airawat-syslog/cdh-dn02/2013
    oozieProject/data/airawat-syslog/cdh-dn02/2013/04
    oozieProject/data/airawat-syslog/cdh-dn02/2013/04/messages
    oozieProject/data/airawat-syslog/cdh-dn02/2013/05
    oozieProject/data/airawat-syslog/cdh-dn02/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-dn03
    oozieProject/data/airawat-syslog/cdh-dn03/2013
    oozieProject/data/airawat-syslog/cdh-dn03/2013/04
    oozieProject/data/airawat-syslog/cdh-dn03/2013/04/messages
    oozieProject/data/airawat-syslog/cdh-dn03/2013/05
    oozieProject/data/airawat-syslog/cdh-dn03/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-jt01
    oozieProject/data/airawat-syslog/cdh-jt01/2013
    oozieProject/data/airawat-syslog/cdh-jt01/2013/04
    oozieProject/data/airawat-syslog/cdh-jt01/2013/04/messages
    oozieProject/data/airawat-syslog/cdh-jt01/2013/05
    oozieProject/data/airawat-syslog/cdh-jt01/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-nn01
    oozieProject/data/airawat-syslog/cdh-nn01/2013
    oozieProject/data/airawat-syslog/cdh-nn01/2013/05
    oozieProject/data/airawat-syslog/cdh-nn01/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-vms
    oozieProject/data/airawat-syslog/cdh-vms/2013
    oozieProject/data/airawat-syslog/cdh-vms/2013/05
    oozieProject/data/airawat-syslog/cdh-vms/2013/05/messages
    oozieProject/dataDump/airawat-syslog
    oozieProject/dataDump/airawat-syslog/cdh-dev01
    oozieProject/dataDump/airawat-syslog/cdh-dev01/2013
    oozieProject/dataDump/airawat-syslog/cdh-dev01/2013/04
    oozieProject/dataDump/airawat-syslog/cdh-dev01/2013/04/messages
    oozieProject/dataDump/airawat-syslog/cdh-dev01/2013/05
    oozieProject/dataDump/airawat-syslog/cdh-dev01/2013/05/messages
    oozieProject/dataDump/airawat-syslog/cdh-dn01
    oozieProject/dataDump/airawat-syslog/cdh-dn01/2013
    oozieProject/dataDump/airawat-syslog/cdh-dn01/2013/05
    oozieProject/dataDump/airawat-syslog/cdh-dn01/2013/05/messages
    oozieProject/dataDump/airawat-syslog/cdh-dn02
    oozieProject/dataDump/airawat-syslog/cdh-dn02/2013
    oozieProject/dataDump/airawat-syslog/cdh-dn02/2013/04
    oozieProject/dataDump/airawat-syslog/cdh-dn02/2013/04/messages
    oozieProject/dataDump/airawat-syslog/cdh-dn02/2013/05
    oozieProject/dataDump/airawat-syslog/cdh-dn02/2013/05/messages
    oozieProject/dataDump/airawat-syslog/cdh-dn03
    oozieProject/dataDump/airawat-syslog/cdh-dn03/2013
    oozieProject/dataDump/airawat-syslog/cdh-dn03/2013/04
    oozieProject/dataDump/airawat-syslog/cdh-dn03/2013/04/messages
    oozieProject/dataDump/airawat-syslog/cdh-dn03/2013/05
    oozieProject/dataDump/airawat-syslog/cdh-dn03/2013/05/messages
    oozieProject/dataDump/airawat-syslog/cdh-jt01
    oozieProject/dataDump/airawat-syslog/cdh-jt01/2013
    oozieProject/dataDump/airawat-syslog/cdh-jt01/2013/04
    oozieProject/dataDump/airawat-syslog/cdh-jt01/2013/04/messages
    oozieProject/dataDump/airawat-syslog/cdh-jt01/2013/05
    oozieProject/dataDump/airawat-syslog/cdh-jt01/2013/05/messages
    oozieProject/dataDump/airawat-syslog/cdh-nn01
    oozieProject/dataDump/airawat-syslog/cdh-nn01/2013
    oozieProject/dataDump/airawat-syslog/cdh-nn01/2013/05
    oozieProject/dataDump/airawat-syslog/cdh-nn01/2013/05/messages
    oozieProject/dataDump/airawat-syslog/cdh-vms
    oozieProject/dataDump/airawat-syslog/cdh-vms/2013
    oozieProject/dataDump/airawat-syslog/cdh-vms/2013/05
    oozieProject/dataDump/airawat-syslog/cdh-vms/2013/05/messages
    oozieProject/workflowHdfsAndEmailActions/job.properties
    oozieProject/workflowHdfsAndEmailActions/workflow.xml
  4. @airawat airawat revised this gist Oct 22, 2013. 1 changed file with 0 additions and 13 deletions.
    13 changes: 0 additions & 13 deletions 00-OozieWorkflowHdfsAndEmailActions
    Original file line number Diff line number Diff line change
    @@ -17,16 +17,3 @@ Pictorial overview of workflow:
    -------------------------------
    Available at:
    http://hadooped.blogspot.com/2013/06/apache-oozie-part-1-workflow-with-hdfs.html

    Includes:
    ---------
    01-WorkflowComponents
    02-DataAndCodeDownload
    03-Oozie configuration for SMTP
    04-JobPropertiesFile
    05-WorkflowXMLFile
    06-HdfsLoadCommands
    07-Oozie commands
    08-Program output
    09-Sample email from application
    10-Oozie web console screenshots
  5. @airawat airawat revised this gist Aug 18, 2013. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 06-HdfsLoadCommands
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,7 @@
    Commands to load data
    ----------------------

    a) Load da
    a) Load data
    $ hadoop fs -mkdir oozieProject
    $ hadoop fs -put oozieProject/* oozieProject/

  6. @airawat airawat revised this gist Aug 18, 2013. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion 00-OozieWorkflowHdfsAndEmailActions
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,5 @@
    This gist includes components of a simple workflow application that moves files within hdfs, and deletes a directory;
    This gist includes components of a simple workflow application that created a directory and moves files within
    hdfs to this directory;
    Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section,
    to allow re-run of the action..the prepare essentially negates the move done by a potential prior run
    of the action. Sample data is also included.
  7. @airawat airawat renamed this gist Jul 15, 2013. 1 changed file with 0 additions and 0 deletions.
  8. @airawat airawat revised this gist Jul 15, 2013. 1 changed file with 4 additions and 1 deletion.
    5 changes: 4 additions & 1 deletion 08-Program output
    Original file line number Diff line number Diff line change
    @@ -2,8 +2,11 @@ Program output:
    ---------------

    Expected result:
    The data in the logs directory should be in the directory by name data under oozieProject directory.
    1) The data in the logs directory should be in the directory by name data under oozieProject directory.
    2) The directory 'logs' should be deleted.
    3) An email indicating success/failure of the application

    1)
    $ hadoop fs -ls -R oozieProject | awk '{print $8}'

    oozieProject/data/airawat-syslog
  9. @airawat airawat revised this gist Jul 15, 2013. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion 02-DataAndCodeDownload
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,9 @@
    Download location:
    ------------------
    <<To be added>>
    Google group-
    https://groups.google.com/forum/?hl=en#!topic/hadooped/HhaBiOmW078

    Email me at airawat.blog@gmail.com to contact me if you have access issues.

    Directory structure:
    --------------------
  10. @airawat airawat revised this gist Jul 15, 2013. 4 changed files with 21 additions and 4 deletions.
    6 changes: 4 additions & 2 deletions 00-SampleWorkflowHdfsAndEmailActions
    Original file line number Diff line number Diff line change
    @@ -14,7 +14,8 @@ The sample application includes:

    Pictorial overview of workflow:
    -------------------------------
    <<To be added>>
    Available at:
    http://hadooped.blogspot.com/2013/06/apache-oozie-part-1-workflow-with-hdfs.html

    Includes:
    ---------
    @@ -26,4 +27,5 @@ Includes:
    06-HdfsLoadCommands
    07-Oozie commands
    08-Program output
    09-Oozie web console screenshots
    09-Sample email from application
    10-Oozie web console screenshots
    2 changes: 0 additions & 2 deletions 09-Oozie web console screenshots
    Original file line number Diff line number Diff line change
    @@ -1,2 +0,0 @@
    Screenshots of the Oozie web console are available at:
    <<Link>>
    14 changes: 14 additions & 0 deletions 09-SampleEmail
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,14 @@
    Email from the program
    -----------------------
    From akhanolk@cdh-dev01.localdomain Sun Jul 14 23:08:46 2013
    Return-Path: <akhanolk@cdh-dev01.localdomain>
    X-Original-To: akhanolk@cdh-dev01
    Delivered-To: akhanolk@cdh-dev01.localdomain
    From: akhanolk@cdh-dev01.localdomain
    To: akhanolk@cdh-dev01.localdomain
    Subject: Status of workflow 0000006-130712212133144-oozie-oozi-W
    Content-Type: text/plain; charset=us-ascii
    Date: Sun, 14 Jul 2013 23:08:46 -0500 (CDT)
    Status: R

    The workflow 0000006-130712212133144-oozie-oozi-W completed successfully
    3 changes: 3 additions & 0 deletions 10-Oozie web console screenshots
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,3 @@
    Screenshots of the Oozie web console are available at:
    ------------------------------------------------------
    http://hadooped.blogspot.com/2013/06/apache-oozie-part-1-workflow-with-hdfs.html
  11. @airawat airawat revised this gist Jul 15, 2013. 5 changed files with 107 additions and 22 deletions.
    6 changes: 3 additions & 3 deletions 04-JobPropertiesFile
    Original file line number Diff line number Diff line change
    @@ -10,10 +10,10 @@ oozie.libpath=${nameNode}/user/oozie/share/lib
    oozie.use.system.libpath=true
    oozie.wf.rerun.failnodes=true

    oozieProjectRoot=${nameNode}/user/{user.name}/oozieProject/
    oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject
    oozie.wf.application.path=${oozieProjectRoot}/workflowHdfsAndEmailActions

    dataInputDirectoryAbsPath=${oozieProjectRoot}/logs
    dataInputDirectoryAbsPath=${oozieProjectRoot}/logs/airawat-syslog
    makeDirectoryAbsPath=${oozieProjectRoot}/data
    dataDestinationDirectoryRelativePath=oozieProject/data

    @@ -22,4 +22,4 @@ emailToAddress=akhanolk@cdh-dev01

    #*******End************************

    Note: -The last line is needed if you want to re-run; There is another config we can use instead as well that specifies which failed nodes to skip. Review Apache Oozie documentation for the same.
    Note: -The line - "oozie.wf.rerun.failnodes=true" is needed if you want to re-run; There is another config we can use instead as well that specifies which failed nodes to skip. Review Apache Oozie documentation for the same.
    24 changes: 8 additions & 16 deletions 05-WorkflowXMLFile
    Original file line number Diff line number Diff line change
    @@ -2,27 +2,19 @@
    <!--workflow.xml -->
    <!--******************************************-->

    <workflow-app name="SampleWorkFlowForHDFSAndEmail" xmlns="uri:oozie:workflow:0.1">
    <start to="hdfsPrepareCommands"/>
    <action name="hdfsPrepareCommands">
    <fs>
    <move source='${nameNode}/user/airawat/oozieProject/logs/airawat-syslog' target='oozieProject/data/'/>
    <delete path="${nameNode}/user/airawat/oozieProject/logs"/>
    </fs>
    <ok to="hdfsCommands"/>
    <error to="hdfsCommands"/>
    </action>
    <workflow-app name="WorkFlowForHDFSAndEmailActions" xmlns="uri:oozie:workflow:0.1">
    <start to="hdfsCommands"/>
    <action name="hdfsCommands">
    <fs>
    <mkdir path='${nameNode}/user/{user.name}/oozieProject/logs'/>
    <move source='${nameNode}/user/{user.name}/oozieProject/Data/airawat-syslog' target='oozieProject/logs/'/>
    <mkdir path='${makeDirectoryAbsPath}'/>
    <move source='${dataInputDirectoryAbsPath}' target='${dataDestinationDirectoryRelativePath}'/>
    </fs>
    <ok to="sendEmailSuccess"/>
    <error to="sendEmailKill"/>
    </action>
    <action name="sendEmailSuccess">
    <email xmlns="uri:oozie:email-action:0.1">
    <to>airawat@cdh-dev01</to>
    <to>${emailToAddress}</to>
    <subject>Status of workflow ${wf:id()}</subject>
    <body>The workflow ${wf:id()} completed successfully</body>
    </email>
    @@ -31,7 +23,7 @@
    </action>
    <action name="sendEmailKill">
    <email xmlns="uri:oozie:email-action:0.1">
    <to>airawat@cdh-dev01</to>
    <to>${emailToAddress}</to>
    <subject>Status of workflow ${wf:id()}</subject>
    <body>The workflow ${wf:id()} had issues and was killed. The error message is: ${wf:errorMessage(wf:lastErrorNode())}</body>
    </email>
    @@ -41,5 +33,5 @@
    <kill name="killJobFSAction">
    <message>"Killed job due to error in FS Action"</message>
    </kill>
    <end name="end" />
    </<workflow-app>
    <end name="end"/>
    </workflow-app>
    17 changes: 16 additions & 1 deletion 06-HdfsLoadCommands
    Original file line number Diff line number Diff line change
    @@ -1 +1,16 @@
    .
    Commands to load data
    ----------------------

    a) Load da
    $ hadoop fs -mkdir oozieProject
    $ hadoop fs -put oozieProject/* oozieProject/

    b) Validate load

    $ hadoop fs -ls -R oozieProject | awk '{print $8}'

    You should see...

    oozieProject/logs/airawat-syslog/<<node>>/<<year>>/<<month>>/messages
    oozieProject/workflowHdfsAndEmailActions/job.properties
    oozieProject/workflowHdfsAndEmailActions/workflow.xml
    34 changes: 33 additions & 1 deletion 07-Oozie commands
    Original file line number Diff line number Diff line change
    @@ -1 +1,33 @@
    .
    Oozie commands
    --------------
    Note: Replace oozie server and port, with your cluster-specific.

    1) Submit job:
    $ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowHdfsAndEmailActions/job.properties -submit
    job: 0000001-130712212133144-oozie-oozi-W

    2) Run job:
    $ oozie job -oozie http://cdh-dev01:11000/oozie -start 0000001-130712212133144-oozie-oozi-W

    3) Check the status:
    $ oozie job -oozie http://cdh-dev01:11000/oozie -info 0000001-130712212133144-oozie-oozi-W

    4) Suspend workflow:
    $ oozie job -oozie http://cdh-dev01:11000/oozie -suspend 0000001-130712212133144-oozie-oozi-W

    5) Resume workflow:
    $ oozie job -oozie http://cdh-dev01:11000/oozie -resume 0000001-130712212133144-oozie-oozi-W

    6) Re-run workflow:
    $ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowHdfsAndEmailActions/job.properties -rerun 0000001-130712212133144-oozie-oozi-W

    7) Should you need to kill the job:
    $ oozie job -oozie http://cdh-dev01:11000/oozie -kill 0000001-130712212133144-oozie-oozi-W

    8) View server logs:
    $ oozie job -oozie http://cdh-dev01:11000/oozie -logs 0000001-130712212133144-oozie-oozi-W

    Logs are available at:
    /var/log/oozie on the Oozie server.


    48 changes: 47 additions & 1 deletion 08-Program output
    Original file line number Diff line number Diff line change
    @@ -1 +1,47 @@
    ..
    Program output:
    ---------------

    Expected result:
    The data in the logs directory should be in the directory by name data under oozieProject directory.

    $ hadoop fs -ls -R oozieProject | awk '{print $8}'

    oozieProject/data/airawat-syslog
    oozieProject/data/airawat-syslog/cdh-dev01
    oozieProject/data/airawat-syslog/cdh-dev01/2013
    oozieProject/data/airawat-syslog/cdh-dev01/2013/04
    oozieProject/data/airawat-syslog/cdh-dev01/2013/04/messages
    oozieProject/data/airawat-syslog/cdh-dev01/2013/05
    oozieProject/data/airawat-syslog/cdh-dev01/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-dn01
    oozieProject/data/airawat-syslog/cdh-dn01/2013
    oozieProject/data/airawat-syslog/cdh-dn01/2013/05
    oozieProject/data/airawat-syslog/cdh-dn01/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-dn02
    oozieProject/data/airawat-syslog/cdh-dn02/2013
    oozieProject/data/airawat-syslog/cdh-dn02/2013/04
    oozieProject/data/airawat-syslog/cdh-dn02/2013/04/messages
    oozieProject/data/airawat-syslog/cdh-dn02/2013/05
    oozieProject/data/airawat-syslog/cdh-dn02/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-dn03
    oozieProject/data/airawat-syslog/cdh-dn03/2013
    oozieProject/data/airawat-syslog/cdh-dn03/2013/04
    oozieProject/data/airawat-syslog/cdh-dn03/2013/04/messages
    oozieProject/data/airawat-syslog/cdh-dn03/2013/05
    oozieProject/data/airawat-syslog/cdh-dn03/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-jt01
    oozieProject/data/airawat-syslog/cdh-jt01/2013
    oozieProject/data/airawat-syslog/cdh-jt01/2013/04
    oozieProject/data/airawat-syslog/cdh-jt01/2013/04/messages
    oozieProject/data/airawat-syslog/cdh-jt01/2013/05
    oozieProject/data/airawat-syslog/cdh-jt01/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-nn01
    oozieProject/data/airawat-syslog/cdh-nn01/2013
    oozieProject/data/airawat-syslog/cdh-nn01/2013/05
    oozieProject/data/airawat-syslog/cdh-nn01/2013/05/messages
    oozieProject/data/airawat-syslog/cdh-vms
    oozieProject/data/airawat-syslog/cdh-vms/2013
    oozieProject/data/airawat-syslog/cdh-vms/2013/05
    oozieProject/data/airawat-syslog/cdh-vms/2013/05/messages
    oozieProject/workflowHdfsAndEmailActions/job.properties
    oozieProject/workflowHdfsAndEmailActions/workflow.xml
  12. @airawat airawat revised this gist Jul 15, 2013. 7 changed files with 30 additions and 6 deletions.
    14 changes: 13 additions & 1 deletion 00-SampleWorkflowHdfsAndEmailActions
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,16 @@
    This gist includes components of a simple workflow that move files within hdfs, and delete a directory; Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section, to allow re-run of the action..the prepare essentially negates the move done by a potential prior run of the action. Sample data is also included.
    This gist includes components of a simple workflow application that moves files within hdfs, and deletes a directory;
    Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section,
    to allow re-run of the action..the prepare essentially negates the move done by a potential prior run
    of the action. Sample data is also included.

    The sample application includes:
    --------------------------------
    1. Oozie actions: hdfs action and email action
    2. Oozie workflow controls: start, end, and kill.
    3. Workflow components: job.properties and workflow.xml
    4. Sample data
    5. Commands to deploy workflow, submit and run workflow
    6. Oozie web console - screenshots from sample program execution

    Pictorial overview of workflow:
    -------------------------------
    3 changes: 1 addition & 2 deletions 02-DataAndCodeDownload
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,6 @@
    Download location:
    ------------------


    <<To be added>>


    Directory structure:
    14 changes: 11 additions & 3 deletions 04-JobPropertiesFile
    Original file line number Diff line number Diff line change
    @@ -6,12 +6,20 @@ nameNode=hdfs://cdh-nn01.hadoop.com:8020
    jobTracker=cdh-jt01:8021
    queueName=default

    oozieProjectRoot=${nameNode}/user/{user.name}/oozieProject/
    oozie.wf.application.path=${oozieProjectRoot}/workflowHdfsAndEmailActions

    oozie.libpath=${nameNode}/user/oozie/share/lib
    oozie.use.system.libpath=true
    oozie.wf.rerun.failnodes=true

    oozieProjectRoot=${nameNode}/user/{user.name}/oozieProject/
    oozie.wf.application.path=${oozieProjectRoot}/workflowHdfsAndEmailActions

    dataInputDirectoryAbsPath=${oozieProjectRoot}/logs
    makeDirectoryAbsPath=${oozieProjectRoot}/data
    dataDestinationDirectoryRelativePath=oozieProject/data

    emailToAddress=akhanolk@cdh-dev01


    #*******End************************

    Note: -The last line is needed if you want to re-run; There is another config we can use instead as well that specifies which failed nodes to skip. Review Apache Oozie documentation for the same.
    1 change: 1 addition & 0 deletions 06-HdfsLoadCommands
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    .
    1 change: 1 addition & 0 deletions 07-Oozie commands
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    .
    1 change: 1 addition & 0 deletions 08-Program output
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    ..
    2 changes: 2 additions & 0 deletions 09-Oozie web console screenshots
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,2 @@
    Screenshots of the Oozie web console are available at:
    <<Link>>
  13. @airawat airawat revised this gist Jul 13, 2013. 4 changed files with 96 additions and 4 deletions.
    8 changes: 4 additions & 4 deletions 00-SampleWorkflowHdfsAndEmailActions
    Original file line number Diff line number Diff line change
    @@ -8,10 +8,10 @@ Includes:
    ---------
    01-WorkflowComponents
    02-DataAndCodeDownload
    03-HdfsLoadCommands
    04-Oozie configuration for SMTP
    05-JobPropertiesFile
    06-WorkflowXMLFile
    03-Oozie configuration for SMTP
    04-JobPropertiesFile
    05-WorkflowXMLFile
    06-HdfsLoadCommands
    07-Oozie commands
    08-Program output
    09-Oozie web console screenshots
    30 changes: 30 additions & 0 deletions 03-Oozie configuration for SMTP
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,30 @@
    Oozie SMTP configuration
    ------------------------
    Add the following to the oozie-site.xml, and restart oozie.
    Replace values with the same specific to your environment.

    <!-- SMTP params-->
    <property>
    <name>oozie.email.smtp.host</name>
    <value>cdh-dev01</value>
    </property>
    <property>
    <name>oozie.email.smtp.port</name>
    <value>25</value>
    </property>
    <property>
    <name>oozie.email.from.address</name>
    <value>oozie@cdh-dev01</value>
    </property>
    <property>
    <name>oozie.email.smtp.auth</name>
    <value>false</value>
    </property>
    <property>
    <name>oozie.email.smtp.username</name>
    <value></value>
    </property>
    <property>
    <name>oozie.email.smtp.password</name>
    <value></value>
    </property>
    17 changes: 17 additions & 0 deletions 04-JobPropertiesFile
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,17 @@
    #*****************************
    # job.properties
    #*****************************

    nameNode=hdfs://cdh-nn01.hadoop.com:8020
    jobTracker=cdh-jt01:8021
    queueName=default

    oozieProjectRoot=${nameNode}/user/{user.name}/oozieProject/
    oozie.wf.application.path=${oozieProjectRoot}/workflowHdfsAndEmailActions

    oozie.libpath=${nameNode}/user/oozie/share/lib
    oozie.use.system.libpath=true
    oozie.wf.rerun.failnodes=true


    Note: -The last line is needed if you want to re-run; There is another config we can use instead as well that specifies which failed nodes to skip. Review Apache Oozie documentation for the same.
    45 changes: 45 additions & 0 deletions 05-WorkflowXMLFile
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,45 @@
    <!--******************************************-->
    <!--workflow.xml -->
    <!--******************************************-->

    <workflow-app name="SampleWorkFlowForHDFSAndEmail" xmlns="uri:oozie:workflow:0.1">
    <start to="hdfsPrepareCommands"/>
    <action name="hdfsPrepareCommands">
    <fs>
    <move source='${nameNode}/user/airawat/oozieProject/logs/airawat-syslog' target='oozieProject/data/'/>
    <delete path="${nameNode}/user/airawat/oozieProject/logs"/>
    </fs>
    <ok to="hdfsCommands"/>
    <error to="hdfsCommands"/>
    </action>
    <action name="hdfsCommands">
    <fs>
    <mkdir path='${nameNode}/user/{user.name}/oozieProject/logs'/>
    <move source='${nameNode}/user/{user.name}/oozieProject/Data/airawat-syslog' target='oozieProject/logs/'/>
    </fs>
    <ok to="sendEmailSuccess"/>
    <error to="sendEmailKill"/>
    </action>
    <action name="sendEmailSuccess">
    <email xmlns="uri:oozie:email-action:0.1">
    <to>airawat@cdh-dev01</to>
    <subject>Status of workflow ${wf:id()}</subject>
    <body>The workflow ${wf:id()} completed successfully</body>
    </email>
    <ok to="end"/>
    <error to="end"/>
    </action>
    <action name="sendEmailKill">
    <email xmlns="uri:oozie:email-action:0.1">
    <to>airawat@cdh-dev01</to>
    <subject>Status of workflow ${wf:id()}</subject>
    <body>The workflow ${wf:id()} had issues and was killed. The error message is: ${wf:errorMessage(wf:lastErrorNode())}</body>
    </email>
    <ok to="killJobFSAction"/>
    <error to="killJobFSAction"/>
    </action>
    <kill name="killJobFSAction">
    <message>"Killed job due to error in FS Action"</message>
    </kill>
    <end name="end" />
    </<workflow-app>
  14. @airawat airawat revised this gist Jul 13, 2013. 3 changed files with 49 additions and 1 deletion.
    18 changes: 17 additions & 1 deletion 00-SampleWorkflowHdfsAndEmailActions
    Original file line number Diff line number Diff line change
    @@ -1 +1,17 @@
    This gist includes components of a simple workflow that move files within hdfs, and delete a directory; Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section, to allow re-run of the action..the prepare essentially negates the move done by a potential prior run of the action.
    This gist includes components of a simple workflow that move files within hdfs, and delete a directory; Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section, to allow re-run of the action..the prepare essentially negates the move done by a potential prior run of the action. Sample data is also included.

    Pictorial overview of workflow:
    -------------------------------
    <<To be added>>

    Includes:
    ---------
    01-WorkflowComponents
    02-DataAndCodeDownload
    03-HdfsLoadCommands
    04-Oozie configuration for SMTP
    05-JobPropertiesFile
    06-WorkflowXMLFile
    07-Oozie commands
    08-Program output
    09-Oozie web console screenshots
    9 changes: 9 additions & 0 deletions 01-WorkflowComponents
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,9 @@
    Workflow Components:
    --------------------
    1. job.properties
    File containing:
    a) parameter and value declarations that are referenced in the workflows, and
    b) environment information referenced by Oozie to run the workflow including name node, job tracker, workflow application path etc

    2. workflow.xml
    Workflow definition file
    23 changes: 23 additions & 0 deletions 02-DataAndCodeDownload
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,23 @@
    Download location:
    ------------------




    Directory structure:
    --------------------

    oozieProject
    data
    airawat-syslog
    <<node>>
    <<year>>
    <<month>>
    messages
    workflowHdfsAndEmailActions
    job.prperties
    workflow.xml




  15. @airawat airawat revised this gist Jul 13, 2013. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 00-SampleWorkflowHdfsAndEmailActions
    Original file line number Diff line number Diff line change
    @@ -1 +1 @@
    This gist includes components of a simple workflow that move files within hdfs, and delete a directory; I have also included a prepare section, to allow re-run of the action..The prepare essentially negates the move done by a potential prior run of the action.
    This gist includes components of a simple workflow that move files within hdfs, and delete a directory; Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section, to allow re-run of the action..the prepare essentially negates the move done by a potential prior run of the action.
  16. @airawat airawat revised this gist Jul 13, 2013. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 00-SampleWorkflowHdfsAndEmailActions
    Original file line number Diff line number Diff line change
    @@ -1 +1 @@
    .
    This gist includes components of a simple workflow that move files within hdfs, and delete a directory; I have also included a prepare section, to allow re-run of the action..The prepare essentially negates the move done by a potential prior run of the action.
  17. @airawat airawat created this gist Jul 13, 2013.
    1 change: 1 addition & 0 deletions 00-SampleWorkflowHdfsAndEmailActions
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    .