Venkata Gowri Sai Rakesh Kumar Varanasi vvgsrk

Data Platform Architect at Finnair

Last active December 2, 2019 05:44

Prerequisites before starting spark-shell on glue development endpoint

	# Properties File : Create a properties file with the following configurations and name it as glue_spark_shell.properties

	# Note: In below configurations, Replace the s3 access and secret keys with your key's

	spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
	spark.driver.extraClassPath /usr/share/aws/glue/etl/jars/:/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/hmclient/lib/:/usr/share/java/Hive-JSON-Serde/:/usr/share/aws/sagemaker-spark-sdk/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/glue/etl/python/PyGlue.zip:/usr/share/aws/emr/emrfs/auxlib/:/usr/lib/hadoop/lib/native/:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/glue/etl/conf
	spark.executor.extraClassPath /usr/share/aws/glue/etl/jars/:/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/hmclient/lib/:/usr/share/java/Hive-JSON-Serd

vvgsrk / glue-endpoint-creation-with-aws-cli.cli

Created January 30, 2019 20:23

AWS Glue Development Endpoint Creation with AWS CLI Commands

	# Execute below commands on AWS CLI to create Glue Development Endpoint.
	$GLUE_DEV_ENDPOINT_PUBLIC_KEY = Get-Content -Path 'Please_Put_Your_Public_Key_Path'

	# Create development endpoint with role and public key
	aws glue create-dev-endpoint --endpoint-name any-meaningful-name --role-arn arn:aws:iam::000000000000:role/intended_role --public-key $GLUE_DEV_ENDPOINT_PUBLIC_KEY

	# Get the status of endpoint
	aws glue get-dev-endpoint --endpoint-name any-meaningful-name

	# To Delete the endpoint

vvgsrk / create_UDF_and_UDA_in_Cassandra.cqlsh

Created January 30, 2019 19:45

Create UDF and UDA in Cassandra

	cqlsh:test_ks>

	CREATE TABLE emp_dept_info(emp_id int,
	emp_name text,
	dept_id int,
	created_time timeuuid,
	PRIMARY KEY((emp_id), created_time));

	# Insert data in emp_dept_info Table