Timothy turtlemonvh

AWS provides a lot of different options for running compute.

In the context of a data pipeline, I was wondering how these stacked up, esp. in terms of price.

Options

Glue

Links

https://aws.amazon.com/glue/pricing/

Keybase proof

I hereby claim:

I am turtlemonvh on github.
I am turtlemonvh (https://keybase.io/turtlemonvh) on keybase.
I have a public key ASAXQE1mPcXc-76QugSJ_tYCaGLGKdoJsahh_dVztN57Cgo

To claim this, I am signing this object:

Different `cp -r` behavior on Linux and Mac

When running the sript above, we see different behavior on Mac and Linux.

On Linux, dir a is copied into dir b

$ bash dircptest.sh
total 0
drwxrwxr-x. 2 vagrant vagrant 21 Mar 7 19:46 a

Selenium tests

Experiments with selenium to vote on a SurveyMonkey survey on OSX. Uses PhantomJS. Headless chrome may be a better option now.

You should have

java installed (can download dmg from oracle)
phantomjs installed: http://phantomjs.org/download.html
- it needs to be on your path
assuming you downloaded and unzipped version 2.1.1 into the current directory, you can use . setpath.sh to fix your path

CloudTrail log search

Download logs from s3 and search through them. Caches downloaded files at _search_downloads/ for better performance. Outputs json. Use jq for further processing and filtering. (example: https://gist.github.com/pcn/f98c7852b0558b847784)

RDS Pricing Markup

Looking at what the markup is on per-hour price of on-demand RDS instances vs on-demand EC2 instances.

See the python code for the analysis.

In the columns below

ec2 = base EC2 instances from us-east-1
p = RDS Postgres flavor

	#!/usr/bin/env python

	"""
	Stitch multiple files worth of AWS transcripts together.
	Does not attempt to match speakers across filesm but does label all speaker changes.
	Usage:

	python stitch_transcript.py *.mp3.json -o out.txt

	See blog post: http://turtlemonvh.github.io/aws-transcribe-for-long-zoom-meetings.html

	import boto3
	from collections import Counter

	"""
	If your data uses "/" in a directory-like structure and you want to expand the list of items.
	Similar to `tree -L2 prefix/` in *nix.
	"""

	s3 = boto3.client('s3')
	bucket_name = "XXX" # s3 bucket name

	import ionicsdk
	import subprocess
	import os

	"""
	Use Ionic to store secrets (e.g. application credentils).
	Shows how to merge keys containing different sets of secrets, so an application can be granted access to different sets of secrets, managed by different access policies.
	Note that if multiple keys are created with the same external id, the newest will be fetched, which makes secret rotation easier.

	Inspired by AWS ParamStore:

	// Based on: http://thushw.blogspot.com/2015/10/cartesian-product-in-scala.html
	package com.github.turtlemonvh.helpers

	object SequenceHelpers {
	/* Take a list of lists and return a list of lists that is the cartesian product of the members if each list.

	val seqs = List(List("1", "2", "3"), List("a", "b", "c", "d"), List("true", "false"))

	// 24 = (3 * 4 * 2)
	cartesianProduct[String](seqs).length