Skip to content

Instantly share code, notes, and snippets.

View soaxelbrooke's full-sized avatar
📈
Text ⇨ Understanding

Stuart Axelbrooke soaxelbrooke

📈
Text ⇨ Understanding
View GitHub Profile
@soaxelbrooke
soaxelbrooke / elasticsearch_python_talk.md
Last active January 21, 2017 02:40
Transcript of a live-coded Python + Elasticsearch talk about text analytics

Text analytics engine!

Hey guys! I'm @soaxelbrooke, and I am here to show you ladies and guys how to create a basic text analytics engine with Elasticsearch.

Getting the data

Let's get the data first! These are product reviews from Amazon, which can be found here.

$ curl http://times.cs.uiuc.edu/~wang296/Data/LARA/Amazon/AmazonReviews.zip -o reviews.zip
@soaxelbrooke
soaxelbrooke / callable_dict.py
Last active March 17, 2017 07:57
A callable dictionary useful for functional programming
from typing import Optional, Hashable, TypeVar
class CallableDict(dict):
V = TypeVar('V')
""" A callable dictionary useful for functional programming """
def __call__(self, key: Hashable, default: Optional[V]=None) -> Optional[V]:
return self.get(key, default)
@soaxelbrooke
soaxelbrooke / get_dynamo.sh
Last active May 30, 2017 03:40 — forked from vedit/gist:ec8b9b16d403a0dd410791ad62ad48ef
dynamodb local setup
#!/bin/bash
DYNAMODB_USER=stuart
cd /home/${DYNAMODB_USER}/bin
mkdir -p dynamodb
cd dynamodb
wget http://dynamodb-local.s3-website-us-west-2.amazonaws.com/dynamodb_local_latest.tar.gz
tar -xvzf dynamodb_local_latest.tar.gz
@soaxelbrooke
soaxelbrooke / quickjest.js
Last active August 25, 2019 23:30
quickjest.js - A quickcheck-style property-based test wrapper for Jest
// A prototype-based test wrapper based on generators.
// See original Haskell quickcheck paper: http://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/quick.pdf
// --------------------------- //
// Scroll to bottom for usage! //
// --------------------------- //
import R from 'ramda';
const RUNS_PER_TEST = 50;
@soaxelbrooke
soaxelbrooke / default_logging.py
Last active May 30, 2018 08:28
A sensible default logging setup for Python
logging.basicConfig(format='%(levelname)s:%(asctime)s.%(msecs)03d [%(threadName)s] - %(message)s',
datefmt='%Y-%m-%d,%H:%M:%S',
level=getattr(logging, os.getenv('LOG_LEVEL', 'INFO')))
@soaxelbrooke
soaxelbrooke / phrase_detection.sql
Last active January 10, 2018 19:38
Phrase detection implemented in pure PostgreSQL
WITH tokens AS (
-- Just edit MY_TABLE, MY_TEXT_COL, and MY_PKEY_COL, and watch it go!
SELECT MY_PKEY_COL AS pkey, (unnest(to_tsvector(MY_TEXT_COL))).* FROM MY_TABLE
), token_stream AS (
SELECT pkey, unnest(positions) AS token_idx, lexeme
FROM tokens ORDER BY pkey, token_idx
), token_counts AS (
SELECT lexeme, sum(count) AS count
FROM (
SELECT lexeme, array_length(positions, 1) AS count FROM tokens
#!/usr/bin/env python3
import sqlite3
import csv
import sys
quantize = '--quantize' in sys.argv
@soaxelbrooke
soaxelbrooke / custom_sql_query_postgres.md
Last active March 18, 2024 14:35
Custom SQL Query Execution in Postgrest

Postgrest doesn't like you executing arbitrary queries, but you can get around it by defining a function that executes the query for you:

$ psql mydb
mydb=# create function custom_query(query text) returns setof json as $f$
    begin 
    return query execute format('with tmp as (%s) select row_to_json(tmp.*) from tmp;', query); 
    end
 $f$ language plpgsql;
@soaxelbrooke
soaxelbrooke / sqlite_experiment.py
Last active March 1, 2018 21:16
Simple sqlite experiment logger
""" Class for tracking experiments in a local sqlite database """
class SqliteExperiment:
def __init__(self, hparams, metrics, experiment_id=None):
self.experiment_id = experiment_id or str(uuid4())
self.hparams = hparams
self.metrics = metrics
self.metric_names = ['experiment_id', 'measured_at'] + [n for n, t in metrics]
self.log_every = int(os.environ.get('LOG_EVERY', 10000))
self.last_log = None
@soaxelbrooke
soaxelbrooke / fetch_form_submit.js
Created March 26, 2018 00:52
Submitting form data with fetch in JavaScript
function postTagData(myUrl, documentId, fullText, tag, start, end) {
let form = new FormData();
form.set('document_id', documentId);
form.set('full_text', fullText);
form.set('region_start', start);
form.set('region_end', end);
return window.fetch(myUrl, {
method: 'POST',
body: form
});