Last active
November 1, 2015 10:31
-
-
Save holacode/551eca4a958f2f057aa8 to your computer and use it in GitHub Desktop.
Reading file from s3 using spark , jars and config for sparkshell
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adding multiple jar to spark classpath | |
#make comma seperated jar list and give input to --jar flag | |
./spark-shell --jars $(echo ~/lib/*.jar | tr ' ' ',') | |
sc.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") | |
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId","yourID") | |
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey","yourAccesskey") | |
val input = sc.textFile("s3:/pathtoyourcsv") | |
//some custom processing | |
val pairs = input.map(x => (x.split(",")(1), x)) | |
//should print first line of csv | |
pairs.first | |
## list of jar required to for accessing s3 from spark, source and doc can be removed. | |
aspectjrt.jar commons-codec-1.6.jar httpclient-4.3.6.jar joda-time-2.8.1.jar | |
aspectjweaver.jar commons-logging-1.1.3.jar httpcore-4.3.3.jar spring-beans-3.0.7.jar | |
aws-java-sdk-1.10.30.jar freemarker-2.3.18.jar jackson-annotations-2.5.3.jar spring-context-3.0.7.jar | |
aws-java-sdk-1.10.30-javadoc.jar guava-18.0.jar jackson-core-2.5.3.jar spring-core-3.0.7.jar | |
aws-java-sdk-1.10.30-sources.jar hadoop-aws-2.6.0.jar jackson-databind-2.5.3.jar | |
aws-java-sdk-flow-build-tools-1.10.30.jar hadoop-common-2.6.0.jar javax.mail-api-1.4.6.jar | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment