ravsau/spark-emr-wordcount.MD

Last active September 21, 2019 02:31

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/ravsau/1129794bfa56655a4d03e079190718b5.js"></script>
Save ravsau/1129794bfa56655a4d03e079190718b5 to your computer and use it in GitHub Desktop.

Download ZIP

Spark-word-count-on-aws-emr

Raw

spark-emr-wordcount.MD

WordCount

I started by looking at this 5 year old but quite useful file on this repo. https://github.com/alexmilowski/emr/tree/master/spark

Author

ravsau commented Sep 19, 2019

Sanjogsharma commented Sep 19, 2019 •

edited by ravsau

Loading

Code:

aws emr create-cluster --ami-version 3.2.1 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.medium --name SparkCluster --enable-debugging --tags Name=emr --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark --ec2-attributes KeyName=**Your Key** --log-uri s3://**YOUR BUCKET**

Author

ravsau commented Sep 19, 2019 •

edited

Loading

this command will launch the emr cluster. Replace key-name and s3 bucket with your bucket and key name.

aws emr create-cluster --ami-version 3.2.1 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.medium --name SparkCluster --enable-debugging --tags Name=emr --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark --ec2-attributes KeyName=your-key --log-uri s3://your-bucket

EMR launches a cluster that you can view on the ec2 console.

Author

ravsau commented Sep 21, 2019

cat output.txt | awk -F\: '{print $1 $2}'| sort -nk 2
sort by second column

Author

ravsau commented Sep 21, 2019

Author

ravsau commented Sep 21, 2019 •

edited

Loading

Resized because we got out of memory error OOM

Author

ravsau commented Sep 21, 2019

Problem can be the snappy compressed files ☝️

Author

ravsau/spark-emr-wordcount.MD

WordCount

ravsau commented Sep 19, 2019

Uh oh!

Sanjogsharma commented Sep 19, 2019 •

edited by ravsau

Loading

Uh oh!

ravsau commented Sep 19, 2019 •

edited

Loading

Uh oh!

ravsau commented Sep 21, 2019

Uh oh!

ravsau commented Sep 21, 2019

Uh oh!

ravsau commented Sep 21, 2019 •

edited

Loading

Uh oh!

ravsau commented Sep 21, 2019

Uh oh!

ravsau commented Sep 21, 2019

Uh oh!

ravsau/spark-emr-wordcount.MD

WordCount

ravsau commented Sep 19, 2019

Uh oh!

Sanjogsharma commented Sep 19, 2019 • edited by ravsau Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ravsau commented Sep 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ravsau commented Sep 21, 2019

Uh oh!

ravsau commented Sep 21, 2019

Uh oh!

ravsau commented Sep 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Resized because we got out of memory error OOM

Uh oh!

ravsau commented Sep 21, 2019

Problem can be the snappy compressed files ☝️

Uh oh!

ravsau commented Sep 21, 2019

Uh oh!

Sanjogsharma commented Sep 19, 2019 •

edited by ravsau

Loading

ravsau commented Sep 19, 2019 •

edited

Loading

ravsau commented Sep 21, 2019 •

edited

Loading