Skip to content

Instantly share code, notes, and snippets.

View mrchristine's full-sized avatar

Miklos C mrchristine

View GitHub Profile
@mrchristine
mrchristine / spark_stuff.scala
Created June 7, 2019 15:53
Spark Notes / Tips to Remember
spark.conf.isModifiable("spark.sql.shuffle.partitions")
@mrchristine
mrchristine / decode_aws_error.sh
Created December 5, 2019 15:33
Decode and pretty print an encoded error message from AWS
#!/bin/bash
# grab decoded error message
error=`aws sts decode-authorization-message --encoded-message $@ | jq .DecodedMessage`
# trim the start and end double quotes
json_err=${error:1: -1}
# remove escaped quoted strings and pretty print with jq
echo $json_err | sed 's|\\"|"|g' | jq .
@mrchristine
mrchristine / get_s3_storage_costs.sh
Created December 11, 2019 15:33
Calculate S3 costs for Storage
#!/bin/bash
# get the last date in the file
last_date=`cat $@ | awk -F',' '{print $5}' | awk '{print $1}' | grep -v "Start" | sort | uniq | tail -n1`
# pass in the report.csv and calculate total storage costs for StandardStorage tier
cat "$@" | grep $last_date | awk -F, '{printf "%.2f GB %s %s \n", $7/(1024**3 )/24, $4, $2}' | grep "StandardStorage" | uniq | sort -n
echo "Processed for $last_date"
@mrchristine
mrchristine / iam.py
Created January 15, 2020 16:47
Bypass IAM Check
import requests
token = 'MYTOKEN'
url = 'https://EXAMPLE.cloud.databricks.com'
ip = 'arn:aws:iam::123456789:instance-profile/databricks_special_role'
class DatabricksRestClient:
"""A class to define wrappers for the REST API"""
@mrchristine
mrchristine / get_spark_ui.py
Created February 12, 2020 23:54
Script to get the Spark UI dynamically
ui_port = spark.sql("set spark.ui.port").collect()[0].value
env = "myenvironment.cloud.databricks.com"
cluster_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().clusterId().getOrElse(None)
url = "https://{0}/driver-proxy-api/o/0/{1}/{2}/api/v1/".format(env, cluster_id, ui_port)
import requests
token = "TOKEN"
@mrchristine
mrchristine / find_clones.py
Created February 25, 2020 18:15
Find cloned notebooks and find most cloned
# $ cat nb_names.log | sort | uniq -c | sort -nrk1 | head
import os, re
# find cloned notebooks with parens
pattern = re.compile(r"\((\d+)\)")
with open('user_workspace.log', 'r') as fp, open('nb_names.log', 'w') as fp_w:
for x in fp:
nb_name = os.path.basename(x.rstrip())