This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist is part of a series of gists related to Map-side joins in Java map-reduce. | |
In the gist - https://gist.github.com/airawat/6597557, we added the reference data available | |
in HDFS to the distributed cache from the driver code. | |
This gist demonstrates adding a local file via command line to distributed cache. | |
Refer gist at https://gist.github.com/airawat/6597557 for- | |
1. Data samples and structure | |
2. Expected results | |
3. Commands to load data to HDFS |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist demonstrates how to do a map-side join, joining a MapFile from distributedcache | |
with a larger dataset in HDFS. | |
Includes: | |
--------- | |
1. Input data and script download | |
2. Dataset structure review | |
3. Expected results | |
4. Mapper code | |
5. Driver code |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist demonstrates how to do a map-side join, loading one small dataset from DistributedCache into a HashMap | |
in memory, and joining with a larger dataset. | |
Includes: | |
--------- | |
1. Input data and script download | |
2. Dataset structure review | |
3. Expected results | |
4. Mapper code | |
5. Driver code |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist demonstrates how to create a sequence file (compressed and uncompressed), from a text file. | |
Includes: | |
--------- | |
1. Input data and script download | |
2. Input data-review | |
3. Data load commands | |
4. Mapper code | |
5. Driver code to create the sequence file out of a text file in HDFS | |
6. Command to run Java program |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist demonstrates how to create a map file, from a text file. | |
Includes: | |
--------- | |
1. Input data and script download | |
2. Input data-review | |
3. Data load commands | |
4. Java program to create the map file out of a text file in HDFS | |
5. Command to run Java program | |
6. Results of the program run to create map file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie workflow - scripts/code, sample data | |
and commands; Oozie actions covered: shell action, email action | |
Action 1: The shell action executes a shell script that does a line count for files in a | |
glob provided, and writes the line count to standard output | |
Action 2: The email action emails the output of action 1 | |
Pictorial overview of job: | |
-------------------------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import java.util.Properties; | |
import org.apache.oozie.client.OozieClient; | |
import org.apache.oozie.client.WorkflowJob; | |
public class myOozieWorkflowJavaAPICall { | |
public static void main(String[] args) { | |
OozieClient wc = new OozieClient("http://cdh-dev01:11000/oozie"); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie workflow application - scripts/code, sample data | |
and commands; Oozie actions covered: sub-workflow, email java main action, | |
sqoop action (to mysql); Oozie controls covered: decision; | |
Pictorial overview: | |
-------------------- | |
http://hadooped.blogspot.com/2013/07/apache-oozie-part-8-subworkflow.html | |
Usecase: | |
-------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Introduction | |
------------- | |
This gist includes sample data, application components, and components to execute a bundle application. | |
The sample bundle application is time triggered. The start time is defined in the bundle job.properties | |
file. The bundle application starts two coordinator applications- as defined in the bundle definition file - | |
bundleConfirguration.xml. | |
The first coordinator job is time triggered. The start time is defined in the bundle job.properties file. | |
It runs a workflow, that includes a java main action. The java program parses some log files and generates |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie workflow - scripts/code, sample data | |
and commands; Oozie actions covered: java main action; Oozie controls | |
covered: start, kill, end; The java program uses regex to parse the logs, and | |
also extracts pat of the mapper input directory path and includes in the key | |
emitted. | |
Usecase | |
------- | |
Parse Syslog generated log files to generate reports; | |