I hereby claim:
- I am mlimotte on github.
- I am mlimotte (https://keybase.io/mlimotte) on keybase.
- I have a public key ASCPpX8cibderVDoBlFGbVy0_lZQQmxZKpKSBE4BzBNKqgo
To claim this, I am signing this object:
#!/bin/bash -e | |
# 2010-09-19 Marc Limotte | |
# Run continuously (every 30 minutes) as a cron. | |
# | |
# Looks for directories in HDFS matching a certain pattern and moves them to S3, using Amazon's new | |
# distcp replacement, S3DistCp. | |
# | |
# It creates marker files (_directory_.done and _directory_.processing) at the S3 destination, so |
package foo.cascalog; | |
import cascading.flow.FlowProcess; | |
import cascading.flow.hadoop.HadoopFlowProcess; | |
import cascading.operation.FunctionCall; | |
import cascading.operation.OperationCall; | |
import cascading.tuple.Tuple; | |
import cascading.tuple.TupleEntry; | |
import cascalog.CascalogFunction; | |
import org.apache.hadoop.conf.Configuration; |
/** | |
* The majority of this class is copied form the Cascalog source (1.7.0-SNAPSHOT as of 9/17/2011). | |
* This is a filter operation, where the FlowProcess object is exposed | |
*/ | |
package com.weatherbill.hadoop; | |
import cascading.operation.Filter; | |
import cascading.operation.FilterCall; | |
import cascading.flow.FlowProcess; |
(ns mlimotte.util) | |
; A variation on clojure.core/merge-with | |
(defn merge-with-key | |
"Returns a map that consists of the rest of the maps conj-ed onto | |
the first. If a key occurs in more than one map, the mapping(s) | |
from the latter (left-to-right) will be combined with the mapping in | |
the result by calling (f key val-in-result val-in-latter)." | |
[f & maps] |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; | |
;;; Sample of a Jobdef for a Streaming job | |
;;; | |
;;; Example of common usage: | |
;;; lemur run strm-jobdef.clj --bucket my-bucket-name | |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; | |
(catch-args | |
[:bucket "An s3 bucket, e.g. 'com.myco.bucket1'"] | |
) |
# Based on https://gist.github.com/gareth625/5d69cd883b3a154f0fa7 | |
# Run it with `lemur run test_jobdef.clj` | |
(catch-args | |
[:run-step | |
"Set as the name of the step" | |
"lemur-is-awesome"]) | |
(defcluster the-cluster | |
:app "AnApp" |
import com.google.gson.Gson | |
import scala.collection.JavaConversions | |
val gson = new Gson() | |
val mapPrototype = new java.util.HashMap[String,Any]() | |
def parseJson(json: String): Map[String,Any] = { | |
// Note: mapAsScalaMap is a wrapper, the data is NOT copied | |
scala.collection.JavaConversions.mapAsScalaMap(gson.fromJson(json, mapPrototype.getClass)).toMap | |
} |
#!/bin/bash | |
function vault-aws () { | |
VAULT_PATH=$1 | |
if [ -z "$VAULT_PATH" ]; then | |
echo "Missing VAULT_PATH argument.\nExample: `vault-aws documents-store`" | |
exit 1 | |
fi | |
if [ -z "$VAULT_ADDR" ]; then | |
echo "Missing VAULT_ADDR env variable" |
I hereby claim:
To claim this, I am signing this object:
We have remote developers who occassionally need access to AWS servers QA and Staging databases (RDS mysql instances). The AWS servers (EC2, fargate) are in a private VPC. The RDS databases are in different VPCs, they have the "publicly accessible" attribute set, which means they get a pubilc DNS, but only a handful or IPs are whitelisted for that access; developers should get access over a VPN.
This is summarized as:
laptop --ClientVPN--> VPC _A_ --VPC Peer--> RDS in VPC _B_
I choose the Cliet VPN Endpoint so that AWS would manage the remote side of the tunnel. I choose Viscosity (on a Mac) as our VPN client because it's easy to use and support split-dns and split-routing. It's affordable, but not free. Split DNS is important so that Amazon hostnames can be resolved to their internal IP addresses. Split routing is important so that only the AWS destined traffic goes over the VPC tunnel and other internet traffic can go direct to internet.