Last active
December 2, 2019 05:44
-
-
Save vvgsrk/089e079ae54686cb935204b13df608ce to your computer and use it in GitHub Desktop.
Prerequisites before starting spark-shell on glue development endpoint
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Properties File : Create a properties file with the following configurations and name it as glue_spark_shell.properties | |
# Note: In below configurations, Replace the s3 access and secret keys with your key's | |
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem | |
spark.driver.extraClassPath /usr/share/aws/glue/etl/jars/*:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/hmclient/lib/*:/usr/share/java/Hive-JSON-Serde/*:/usr/share/aws/sagemaker-spark-sdk/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/glue/etl/python/PyGlue.zip:/usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/hadoop/lib/native/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/glue/etl/conf | |
spark.executor.extraClassPath /usr/share/aws/glue/etl/jars/*:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/hmclient/lib/*:/usr/share/java/Hive-JSON-Serde/*:/usr/share/aws/sagemaker-spark-sdk/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/glue/etl/python/PyGlue.zip:/usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/hadoop/lib/native/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/glue/etl/conf | |
spark.hadoop.fs.s3a.access.key <your_access_key> | |
spark.hadoop.fs.s3a.secret.key <your_secret_key> | |
hive.metastore.client.factory.class com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory | |
# Move the properties file to Glue Dev Endpoint Server. | |
# The above created file can be used to start the glue-spark-shell (Scala) or gluepyspark (Python) spark shell using following command | |
# $ glue-spark-shell -v --properties-file /home/glue/glue_spark_shell.properties |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment