Skip to content

Instantly share code, notes, and snippets.

@harlo
harlo / label_cleanser.py
Created January 16, 2014 22:12
This is the python implementation of our label cleanser (which we ultimately ported to ruby.) In python, we use Levenshtein, but in ruby, we just JaroWinkler. The results are slightly different, so the thresholds had to be adjusted accordingly.
from collections import namedtuple
from Levenshtein import ratio
from Levenshtein import distance
import re, csv, os
delimiter = ','
quotechar = '|'
quoting = csv.QUOTE_MINIMAL
BrandInfo = namedtuple('BrandInfo', 'model_text utqg_correlate')
#!/usr/bin/env python2
# Quick and dirty demonstration of CVE-2014-0160 by Jared Stafford ([email protected])
# The author disclaims copyright to this source code.
import sys
import struct
import socket
import time
import select
@harlo
harlo / auto_jekyll.sh
Last active August 29, 2015 14:00
automatic jekyll posts!
#! /bin/bash
BLOG_DIR=/where/is/your/jekyll/blog
ASSETS_DIR=$BLOG_DIR/subpath/for/your/assets/like/images/or/whatever
function join { local IFS="$1"; shift; echo "$*"; }
MAKE_TITULAR_DIR=false
if [ $# -eq 0 ]
then
TITLE="new_post"
@harlo
harlo / api.py
Last active August 29, 2015 14:01
git-annex api for web remotes
import re, os, signal
import tornado.ioloop, tornado.web, tornado.httpserver
from subprocess import Popen, PIPE
from sys import exit
# DON'T FORGET TO SET THIS!
ANNEX_DIR = "/path/to/your/remote/repository"
API_PORT = 8888
NUM_PROCESSES = 10
@harlo
harlo / skillet.py
Last active August 29, 2015 14:05
force quits some of my packages
import re
from sys import argv, exit
from fabric.api import settings, local
if __name__ == '__main__':
try:
target = argv[1]
except Exception as e:
print e
print "usage: skillet.py [module name (i.e. compass_frontend)]"
@harlo
harlo / evacuate.py
Created September 17, 2014 20:41
how to pull the old informa cam media from the old server
import os, re
from sys import argv
from fabric.api import local, settings
if __name__ == "__main__":
this_dir = os.getcwd()
evacuated = []
prefered = None
if len(argv) > 1:
@harlo
harlo / keybase.md
Last active June 4, 2018 20:34
current keybase verrification

Keybase proof

I hereby claim:

  • I am harlo on github.
  • I am harlo (https://keybase.io/harlo) on keybase.
  • I have a public key whose fingerprint is 4422 F773 B498 8C77 F99D 287E 655C 2E48 33B8 2A02

To claim this, I am signing this object:

@harlo
harlo / buildGensimDictionary.py
Created December 4, 2014 18:27
buildGensimDictionary
def buildGensimDictionary(uv_task):
task_tag = "BUILDING GENSIM DICTIONARY!!!"
print "\n\n************** %s [START] ******************\n" % task_tag
uv_task.setStatus(302)
import os
from conf import DEBUG, getConfig
dictionary_dir = getConfig('compass.gensim.training_data')
@harlo
harlo / extractNEREntities.py
Last active August 29, 2015 14:10
extractNEREntities
def extractNEREntities(task):
task_tag = "NER ENTITY EXTRACTION"
print "\n\n************** %s [START] ******************\n" % task_tag
print "TOKENIZING TEXT DOCUMENT at %s" % task.doc_id
task.setStatus(302)
from lib.Worker.Models.uv_document import UnveillanceDocument
from conf import DEBUG
from vars import ASSET_TAGS
@harlo
harlo / generatePageMap.py
Created December 4, 2014 18:33
generatePageMap
def generatePageMap(uv_task):
task_tag = "PAGE MAPPER"
print "\n\n************** %s [START] ******************\n" % task_tag
print "MAPPING PAGES FROM TEXT DOCUMENT at %s" % uv_task.doc_id
uv_task.setStatus(302)
from lib.Worker.Models.uv_document import UnveillanceDocument
from conf import DEBUG
from vars import ASSET_TAGS