AWS provides a lot of different options for running compute.
In the context of a data pipeline, I was wondering how these stacked up, esp. in terms of price.
| #!/usr/bin/env python | |
| """ | |
| Stitch multiple files worth of AWS transcripts together. | |
| Does not attempt to match speakers across filesm but does label all speaker changes. | |
| Usage: | |
| python stitch_transcript.py *.mp3.json -o out.txt | |
| See blog post: http://turtlemonvh.github.io/aws-transcribe-for-long-zoom-meetings.html |
AWS provides a lot of different options for running compute.
In the context of a data pipeline, I was wondering how these stacked up, esp. in terms of price.
I hereby claim:
To claim this, I am signing this object:
| import boto3 | |
| from collections import Counter | |
| """ | |
| If your data uses "/" in a directory-like structure and you want to expand the list of items. | |
| Similar to `tree -L2 prefix/` in *nix. | |
| """ | |
| s3 = boto3.client('s3') | |
| bucket_name = "XXX" # s3 bucket name |
| import ionicsdk | |
| import subprocess | |
| import os | |
| """ | |
| Use Ionic to store secrets (e.g. application credentils). | |
| Shows how to merge keys containing different sets of secrets, so an application can be granted access to different sets of secrets, managed by different access policies. | |
| Note that if multiple keys are created with the same external id, the newest will be fetched, which makes secret rotation easier. | |
| Inspired by AWS ParamStore: |
Experiments with selenium to vote on a SurveyMonkey survey on OSX. Uses PhantomJS. Headless chrome may be a better option now.
You should have
. setpath.sh to fix your path| // Based on: http://thushw.blogspot.com/2015/10/cartesian-product-in-scala.html | |
| package com.github.turtlemonvh.helpers | |
| object SequenceHelpers { | |
| /* Take a list of lists and return a list of lists that is the cartesian product of the members if each list. | |
| val seqs = List(List("1", "2", "3"), List("a", "b", "c", "d"), List("true", "false")) | |
| // 24 = (3 * 4 * 2) | |
| cartesianProduct[String](seqs).length |
Download logs from s3 and search through them.
Caches downloaded files at _search_downloads/ for better performance.
Outputs json. Use jq for further processing and filtering. (example: https://gist.github.com/pcn/f98c7852b0558b847784)