Miklos C mrchristine

Spaces over tabs, vim or emacs.

mrchristine / spark_stuff.scala

Created June 7, 2019 15:53

Spark Notes / Tips to Remember

spark.conf.isModifiable("spark.sql.shuffle.partitions")

mrchristine / decode_aws_error.sh

Created December 5, 2019 15:33

Decode and pretty print an encoded error message from AWS

	#!/bin/bash

	# grab decoded error message
	error=`aws sts decode-authorization-message --encoded-message $@ \| jq .DecodedMessage`
	# trim the start and end double quotes
	json_err=${error:1: -1}
	# remove escaped quoted strings and pretty print with jq
	echo $json_err \| sed 's\|\\"\|"\|g' \| jq .

mrchristine / get_s3_storage_costs.sh

Created December 11, 2019 15:33

Calculate S3 costs for Storage

	#!/bin/bash

	# get the last date in the file
	last_date=`cat $@ \| awk -F',' '{print $5}' \| awk '{print $1}' \| grep -v "Start" \| sort \| uniq \| tail -n1`
	# pass in the report.csv and calculate total storage costs for StandardStorage tier
	cat "$@" \| grep $last_date \| awk -F, '{printf "%.2f GB %s %s \n", $7/(1024**3 )/24, $4, $2}' \| grep "StandardStorage" \| uniq \| sort -n
	echo "Processed for $last_date"

mrchristine / iam.py

Created January 15, 2020 16:47

Bypass IAM Check

	import requests

	token = 'MYTOKEN'
	url = 'https://EXAMPLE.cloud.databricks.com'

	ip = 'arn:aws:iam::123456789:instance-profile/databricks_special_role'

	class DatabricksRestClient:
	"""A class to define wrappers for the REST API"""

mrchristine / get_spark_ui.py

Created February 12, 2020 23:54

Script to get the Spark UI dynamically

	ui_port = spark.sql("set spark.ui.port").collect()[0].value

	env = "myenvironment.cloud.databricks.com"

	cluster_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().clusterId().getOrElse(None)
	url = "https://{0}/driver-proxy-api/o/0/{1}/{2}/api/v1/".format(env, cluster_id, ui_port)

	import requests
	token = "TOKEN"

mrchristine / find_clones.py

Created February 25, 2020 18:15

Find cloned notebooks and find most cloned

	# $ cat nb_names.log \| sort \| uniq -c \| sort -nrk1 \| head

	import os, re

	# find cloned notebooks with parens
	pattern = re.compile(r"$(\d+)$")

	with open('user_workspace.log', 'r') as fp, open('nb_names.log', 'w') as fp_w:
	for x in fp:
	nb_name = os.path.basename(x.rstrip())