AWS provides a lot of different options for running compute.
In the context of a data pipeline, I was wondering how these stacked up, esp. in terms of price.
#!/usr/bin/env python | |
""" | |
Stitch multiple files worth of AWS transcripts together. | |
Does not attempt to match speakers across filesm but does label all speaker changes. | |
Usage: | |
python stitch_transcript.py *.mp3.json -o out.txt | |
See blog post: http://turtlemonvh.github.io/aws-transcribe-for-long-zoom-meetings.html |
AWS provides a lot of different options for running compute.
In the context of a data pipeline, I was wondering how these stacked up, esp. in terms of price.
I hereby claim:
To claim this, I am signing this object:
import boto3 | |
from collections import Counter | |
""" | |
If your data uses "/" in a directory-like structure and you want to expand the list of items. | |
Similar to `tree -L2 prefix/` in *nix. | |
""" | |
s3 = boto3.client('s3') | |
bucket_name = "XXX" # s3 bucket name |
import ionicsdk | |
import subprocess | |
import os | |
""" | |
Use Ionic to store secrets (e.g. application credentils). | |
Shows how to merge keys containing different sets of secrets, so an application can be granted access to different sets of secrets, managed by different access policies. | |
Note that if multiple keys are created with the same external id, the newest will be fetched, which makes secret rotation easier. | |
Inspired by AWS ParamStore: |
Experiments with selenium to vote on a SurveyMonkey survey on OSX. Uses PhantomJS. Headless chrome may be a better option now.
You should have
. setpath.sh
to fix your path// Based on: http://thushw.blogspot.com/2015/10/cartesian-product-in-scala.html | |
package com.github.turtlemonvh.helpers | |
object SequenceHelpers { | |
/* Take a list of lists and return a list of lists that is the cartesian product of the members if each list. | |
val seqs = List(List("1", "2", "3"), List("a", "b", "c", "d"), List("true", "false")) | |
// 24 = (3 * 4 * 2) | |
cartesianProduct[String](seqs).length |
Download logs from s3 and search through them.
Caches downloaded files at _search_downloads/
for better performance.
Outputs json. Use jq for further processing and filtering. (example: https://gist.github.com/pcn/f98c7852b0558b847784)