v5tech’s gists

v5tech / 00-MultipleOutputs

Created April 23, 2014 07:52 — forked from airawat/00-MultipleOutputs

	********************************
	Gist
	********************************

	Motivation
	-----------
	The typical mapreduce job creates files with the prefix "part-"..and then the "m" or "r" depending
	on whether it is a map or a reduce output, and then the part number. There are scenarios where we
	may want to create separate files based on criteria-data keys and/or values. Enter the "MultipleOutputs"
	functionality.

v5tech / 00-SecondarySortJavaMapReduce

Created April 23, 2014 07:52 — forked from airawat/00-SecondarySortJavaMapReduce

	Secondary sort in Mapreduce

	With mapreduce framework, the keys are sorted but the values associated with each key
	are not. In order for the values to be sorted, we need to write code to perform what is
	referred to a secondary sort. The sample code in this gist demonstrates such a sort.

	The input to the program is a bunch of employee attributes.
	The output required is department number (deptNo) in ascending order, and the employee last name,
	first name and employee ID in descending order.

v5tech / 00-MapSideJoinDistCacheThruGenericOptionsParser

Created April 23, 2014 07:52 — forked from airawat/00-MapSideJoinDistCacheThruGenericOptionsParser

	This gist is part of a series of gists related to Map-side joins in Java map-reduce.
	In the gist - https://gist.github.com/airawat/6597557, we added the reference data available
	in HDFS to the distributed cache from the driver code.

	This gist demonstrates adding a local file via command line to distributed cache.
	Refer gist at https://gist.github.com/airawat/6597557 for-
	1. Data samples and structure
	2. Expected results
	3. Commands to load data to HDFS

v5tech / 00-MapSideJoinDistCacheMapFile

Created April 23, 2014 07:53 — forked from airawat/00-MapSideJoinDistCacheMapFile

	This gist demonstrates how to do a map-side join, joining a MapFile from distributedcache
	with a larger dataset in HDFS.

	Includes:
	---------
	1. Input data and script download
	2. Dataset structure review
	3. Expected results
	4. Mapper code
	5. Driver code

v5tech / 00-MapSideJoinDistCacheTextFile

Created April 23, 2014 07:54 — forked from airawat/00-MapSideJoinDistCacheTextFile

	This gist demonstrates how to do a map-side join, loading one small dataset from DistributedCache into a HashMap
	in memory, and joining with a larger dataset.

	Includes:
	---------
	1. Input data and script download
	2. Dataset structure review
	3. Expected results
	4. Mapper code
	5. Driver code

v5tech / 00-CreatingSequenceFile

Created April 23, 2014 07:54 — forked from airawat/00-CreatingSequenceFile

	This gist demonstrates how to create a sequence file (compressed and uncompressed), from a text file.

	Includes:
	---------
	1. Input data and script download
	2. Input data-review
	3. Data load commands
	4. Mapper code
	5. Driver code to create the sequence file out of a text file in HDFS
	6. Command to run Java program

v5tech / 00-CreatingMapFile

Created April 23, 2014 07:55 — forked from airawat/00-CreatingMapFile

	This gist demonstrates how to create a map file, from a text file.

	Includes:
	---------
	1. Input data and script download
	2. Input data-review
	3. Data load commands
	4. Java program to create the map file out of a text file in HDFS
	5. Command to run Java program
	6. Results of the program run to create map file

v5tech / 00-OozieWorkflowShellAction

Created April 23, 2014 07:55 — forked from airawat/00-OozieWorkflowShellAction

	This gist includes components of a oozie workflow - scripts/code, sample data
	and commands; Oozie actions covered: shell action, email action

	Action 1: The shell action executes a shell script that does a line count for files in a
	glob provided, and writes the line count to standard output
	Action 2: The email action emails the output of action 1


	Pictorial overview of job:
	--------------------------

v5tech / 00-OozieWorkflowCallWithJavaAPI

Created April 23, 2014 07:55 — forked from airawat/00-OozieWorkflowCallWithJavaAPI

	import java.util.Properties;

	import org.apache.oozie.client.OozieClient;
	import org.apache.oozie.client.WorkflowJob;

	public class myOozieWorkflowJavaAPICall {

	public static void main(String[] args) {
	OozieClient wc = new OozieClient("http://cdh-dev01:11000/oozie");

v5tech / 00-OozieWorkflowWithSubworkflow

Created April 23, 2014 07:56 — forked from airawat/00-OozieWorkflowWithSubworkflow

	This gist includes components of a oozie workflow application - scripts/code, sample data
	and commands; Oozie actions covered: sub-workflow, email java main action,
	sqoop action (to mysql); Oozie controls covered: decision;

	Pictorial overview:
	--------------------
	http://hadooped.blogspot.com/2013/07/apache-oozie-part-8-subworkflow.html

	Usecase:
	--------