Skip to content

Instantly share code, notes, and snippets.

@sajithdilshan
Created November 21, 2017 08:13
Show Gist options
  • Save sajithdilshan/f2143a7c800b3386eee75c823a3077a9 to your computer and use it in GitHub Desktop.
Save sajithdilshan/f2143a7c800b3386eee75c823a3077a9 to your computer and use it in GitHub Desktop.
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private static final Pattern pattern = Pattern.compile("\"([^\"]*)\"");
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
Matcher matcher = pattern.matcher(value.toString());
if (matcher.find()) {
word.set(matcher.group(0));
context.write(word, one);
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment