Skip to content

Instantly share code, notes, and snippets.

View soaxelbrooke's full-sized avatar
📈
Text ⇨ Understanding

Stuart Axelbrooke soaxelbrooke

📈
Text ⇨ Understanding
View GitHub Profile
@soaxelbrooke
soaxelbrooke / callable_dict.py
Last active March 17, 2017 07:57
A callable dictionary useful for functional programming
from typing import Optional, Hashable, TypeVar
class CallableDict(dict):
V = TypeVar('V')
""" A callable dictionary useful for functional programming """
def __call__(self, key: Hashable, default: Optional[V]=None) -> Optional[V]:
return self.get(key, default)
@soaxelbrooke
soaxelbrooke / elasticsearch_python_talk.md
Last active January 21, 2017 02:40
Transcript of a live-coded Python + Elasticsearch talk about text analytics

Text analytics engine!

Hey guys! I'm @soaxelbrooke, and I am here to show you ladies and guys how to create a basic text analytics engine with Elasticsearch.

Getting the data

Let's get the data first! These are product reviews from Amazon, which can be found here.

$ curl http://times.cs.uiuc.edu/~wang296/Data/LARA/Amazon/AmazonReviews.zip -o reviews.zip
@soaxelbrooke
soaxelbrooke / beta_distribution_fit.scala
Last active October 24, 2016 19:56
Fitting beta distributions in Scala 😉
import scala.sys.process._
object BetaDistributionFit {
val distName: String = "beta"
def fitCommand(samples: Seq[Double]): Seq[String] =
Seq("python", "-c",
s"""
|from scipy import stats
@soaxelbrooke
soaxelbrooke / gensim_phrase_prefix_tree_export.py
Last active October 13, 2016 09:08
Script for exporting large Gensim Phrase models to prefix trees to save memory and CPU time.
from gensim.models import Phrases
import sys
assert len(sys.argv) > 2, "Need gensim model path and output filename!"
in_path, out_path = sys.argv[:2]
class PrefixTree(object):
def __init__(self, words, impl=dict, suffix_impl=list):
self.word = words[0]
@soaxelbrooke
soaxelbrooke / ec2ssh.sh
Last active August 24, 2016 17:51
SSH into the first found ec2 instance matching your name filter
#!/usr/bin/env bash
# Usage: $ ec2ssh cassandra-i-3
ssh $(aws ec2 describe-instances --query 'Reservations[].Instances[].[Tags[?Key==`Name`].Value | [0], PrivateIpAddress]' --output text | grep $1 | head -n 1 | python -c 'import sys; print(sys.stdin.read().split("\t")[1].strip())')
object implicits {
implicit class ESFuture[Response <: ActionResponse](future: ListenableActionFuture[Response])
extends Future[Response] {
override def onComplete[U](f: (Try[Response]) => U)(implicit executor: ExecutionContext): Unit = {
future.addListener(new ActionListener[Response] {
override def onFailure(e: Throwable): Unit = throw e
override def onResponse(response: Response): Unit = f(Try(response))
})

Keybase proof

I hereby claim:

  • I am stuartaxelowen on github.
  • I am soaxelbrooke (https://keybase.io/soaxelbrooke) on keybase.
  • I have a public key whose fingerprint is F8B6 D6F2 A6A7 5C3C C49A 702F 9F22 8954 24AC 725A

To claim this, I am signing this object:

@soaxelbrooke
soaxelbrooke / fscache.py
Last active June 2, 2018 16:27
File System Cache Decorator in Python
""" Caches expensive function calls in pickled bytes on disk. """
import os
import shutil
import subprocess
import dill
from functools import wraps
import hashlib
import base64
@soaxelbrooke
soaxelbrooke / mongo_json_decode.py
Last active May 14, 2018 05:11
Decode mongo JSON in Python... with eval
# It's not pretty, but it get's the job done. JSON is valid Python, so you can eval it,
# provided that you have the proper names in scope, like ObjectId and ISODate
from dateutil import parser
NumberLong = int
ObjectId = str
false = False
class GzipBuffer(object):
def __init__(self):
self.len = 0
self.buffer = io.BytesIO()
self.writer = gzip.GzipFile(fileobj=self.buffer, mode='wb')
def append(self, thing):
self.len += 1
self.writer.write(thing)