This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Scan scan = new Scan(); | |
scan.setFilter(new MyFilter(appId)); // get only rows for the app with appId | |
Htable table = new HTable(config, Bytes.UTF8(tableName); // for this table | |
ResultScanner results = table.getScanner(scan); // apply the scan |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Scan scan = new Scan(); | |
scan.setFilter(new ProxyFilter(new MyFilter(appId))); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
public class ExampleRowKey | |
{ | |
long userId; | |
String applicationId; | |
public byte[] getBytes() throws IOException | |
{ | |
ByteArrayOutputStream byteOutput = new ByteArrayOutputStream(); | |
DataOutputStream data = new DataOutputStream(byteOutput); | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
******************************** | |
Gist | |
******************************** | |
Motivation | |
----------- | |
The typical mapreduce job creates files with the prefix "part-"..and then the "m" or "r" depending | |
on whether it is a map or a reduce output, and then the part number. There are scenarios where we | |
may want to create separate files based on criteria-data keys and/or values. Enter the "MultipleOutputs" | |
functionality. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
********************** | |
**Gist | |
********************** | |
This gist details how to inner join two large datasets on the map-side, leveraging the join capability | |
in mapreduce. Such a join makes sense if both input datasets are too large to qualify for distribution | |
through distributedcache, and can be implemented if both input datasets can be joined by the join key | |
and both input datasets are sorted in the same order, by the join key. | |
There are two critical pieces to engaging the join behavior: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************************* | |
Gist | |
************************* | |
One more gist related to controlling the number of mappers in a mapreduce task. | |
Background on Inputsplits | |
-------------------------- | |
An inputsplit is a chunk of the input data allocated to a map task for processing. FileInputFormat | |
generates inputsplits (and divides the same into records) - one inputsplit for each file, unless the |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
My blog has an introduction to reduce side join in Java map reduce- | |
http://hadooped.blogspot.com/2013/09/reduce-side-join-options-in-java-map.html | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist demonstrates how to create a sequence file (compressed and uncompressed), from a text file. | |
Includes: | |
--------- | |
1. Input data and script download | |
2. Input data-review | |
3. Data load commands | |
4. Mapper code | |
5. Driver code to create the sequence file out of a text file in HDFS | |
6. Command to run Java program |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes hive ql scripts to create an external partitioned table for Syslog | |
generated log files using regex serde; | |
Usecase: Count the number of occurances of processes that got logged, by year, month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data download: 02-DataDownload | |
Data load commands: 03-DataLoadCommands |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a mapper, reducer and driver in java that can parse log files using | |
regex; The code for combiner is the same as reducer; | |
Usecase: Count the number of occurances of processes that got logged, inception to date. | |
Includes: | |
--------- | |
Sample data and scripts for download:01-ScriptAndDataDownload | |
Sample data and structure: 02-SampleDataAndStructure | |
Mapper: 03-LogEventCountMapper.java | |
Reducer: 04-LogEventCountReducer.java |