-
-
Save samklr/db6514074155f49ba8c4dc02f4ef144c to your computer and use it in GitHub Desktop.
How to get spark 1.6.0 with hadoop 2.6 working with s3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <configuration> | |
| <property> | |
| <name>fs.s3a.access.key</name> | |
| <description>AWS access key ID. Omit for Role-based authentication.</description> | |
| <value>YOUR_ACCESS_KEY</value> | |
| </property> | |
| <property> | |
| <name>fs.s3a.secret.key</name> | |
| <description>AWS secret key. Omit for Role-based authentication.</description> | |
| <value>YOUR_SECRET_KEY</value> | |
| </property> | |
| </configuration> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env bash | |
| export DEFAULT_HADOOP_HOME=/usr/lib/hadoop | |
| export SPARK_HOME=/usr/lib/spark | |
| export PYTHONPATH=/usr/lib/spark/python/:/usr/lib/spark/python/lib/py4j-0.9-src.zip | |
| export SPARK_DIST_CLASSPATH=/usr/lib/spark/conf/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/client/*:/usr/lib/spark/lib/* | |
| export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zoo1:2181,zoo2:2181,zoo3:2181" | |
| export STANDALONE_SPARK_MASTER_HOST="spark-master1,spark-master2" | |
| export SPARK_MASTER_WEBUI_PORT=8080 | |
| export SPARK_WORKER_MEMORY=4g | |
| export SPARK_DRIVER_MEMORY=3g | |
| export SPARK_DAEMON_MEMORY=1g |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem | |
| spark.executor.extraClassPath /var/lib/hadoop/lib/aws-java-sdk-1.7.4.jar:/var/lib/hadoop/lib/hadoop-aws-2.7.1.jar | |
| spark.driver.extraClassPath /var/lib/hadoop/lib/aws-java-sdk-1.7.4.jar:/var/lib/hadoop/lib/hadoop-aws-2.7.1.jar |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## Launch this with ./bin/spark-submit pyspark-s3a-example.py from /usr/lib/spark as root | |
| ## Or even better: | |
| # ./bin/spark-submit --master=spark://spark-master-1,spark-master-2 pyspark-s3a-example.py | |
| from pyspark import SparkContext | |
| sc = SparkContext('spark://spark-master-1:7077,spark-master-2:7077') | |
| dataFile = "s3a://dabucket/sample.csv" | |
| input = sc.textFile(dataFile) | |
| header = input.take(1)[0] | |
| rows = input.filter(lambda line: line != header) | |
| lines = rows.map(lambda line: int((line.split(',')[2]))).collect() | |
| print lines |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # A working spark 1.6.0 / hadoop 2.6 configuration for talking to s3 with s3a: | |
| ############################################################################ | |
| # First the ridiculous part - if you have any of these files, delete them. | |
| rm ${HADOOP_HOME}/lib/aws-java-sdk-s3-1.10.6.jar | |
| rm ${HADOOP_HOME}/lib/aws-java-sdk-core-1.10.6.jar | |
| rm /usr/lib/hadoop/hadoop-aws-2.6.0-cdh5.7.0.jar |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ################################################################### | |
| big thanks to: | |
| http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/ | |
| https://gist.github.com/thekensta/21068ef1b6f4af08eb09 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment