Skip to content

Instantly share code, notes, and snippets.

@ethen8181
Last active May 21, 2017 02:35
Show Gist options
  • Save ethen8181/44547ccad4127b93861b3a11867db533 to your computer and use it in GitHub Desktop.
Save ethen8181/44547ccad4127b93861b3a11867db533 to your computer and use it in GitHub Desktop.
  • brew install mvnvm (just to install maven on mac)
  • make a eclipse maven project on your local (File -> New -> Project -> Maven Project). During the setup just click next until you run into a place that prompt you to set the group id = com.javamakeuse.hadoop.poc (it turns out you can name it whatever you want), artifact id = Homeworkx (name is whatever you want, e.g. Homework1)
  • copy the pom.xml from wolf and replace the local pom.xml (you'll see it on your left in eclipse)
  • go to src/main/java and start a new class (e.g. Exercise1) to do your coding
  • after we're done coding, navigate to where the maven project is stored (e.g. mine is stored under /Users/ethen/Documents/workspace/Homework1) and type mvn package to create the jar file
  • After that copy the mr-app-1.0-SNAPSHOT.jar inside the target folder to wolf.
  • Then ssh to wolf and run the job on wolf using hadoop jar <name of jar file> <name of class with main()> <input files> <output directory> e.g. for the wordcount example I had a folder called wordcount on hadoop and I want the output folder to be called output, thus I ran hadoop jar mr-app-1.0-SNAPSHOT.jar com.javamakeuse.hadoop.poc.Homework1.Exercise1 wordcount output. For the class name remember to copy the full path from eclipse (look at the highlighted section in the screenshot below)
  • After that we can do hdfs dfs -cat outFolder/* to look at result, or use hdfs dfs -getmerge <output directory>/ output.txt, where the output.txt will be the merged result, again name this whatever you want
@ethen8181
Copy link
Author

mapreduce

@ethen8181
Copy link
Author

ethen8181 commented Apr 26, 2017

To ssh into a AWS EMR, we need to set up the security configurations correctly (we will only need to do this once).

We can do this through EC2->Security Groups -> Select Master instance -> add SSH inbound rule with your IP address. And we probably also need to change the permission of our keypair (a .pem file) by using chmod 600 <name of the .pem file>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment