Ryan Witt ryanwitt

Keybase proof

I hereby claim:

I am ryanwitt on github.
I am onecreativenerd (https://keybase.io/onecreativenerd) on keybase.
I have a public key ASB_TkeaXoaqMw5ii1FGzwCwYblooenmt-s59k24W87OZAo

To claim this, I am signing this object:

Doctor referral graph / NPI database full-text indexer

You need 7zip installed to grab the NPI database. (brew install p7zip osx)

To create the index, run the init_* scripts. You would need the doctor graph referral data to use *_refer.*, but the NPI database will be automatically downloaded for you. Indexing happens on all cores, and takes less than 10 min on my 8 core machine.

To grab lines matching a search term, use python search_npi.py term.

Note: index performance is good if you have a lot of memory. Index file blocks will stay hot in cache, but they are loaded each time the program is run, which is super inefficient. Should use an on-disk hashtable where the offsets can be calculated instead.

	class RedisTools:
	'''
	A set of utility tools for interacting with a redis cache
	'''

	def __init__(self):
	self._queues = ["default", "high", "low", "failed"]
	self.get_redis_connection()

	def get_redis_connection(self):

	#!/bin/sh

	VERSION=0.12.2
	PLATFORM=linux
	ARCH=x64
	PREFIX=/usr/local

	mkdir -p "$PREFIX" && \
	curl http://nodejs.org/dist/v$VERSION/node-v$VERSION-$PLATFORM-$ARCH.tar.gz \
	\| tar xzvf - --strip-components=1 -C "$PREFIX"

	def collect_ranges(s):
	"""
	Returns a generator of tuples of consecutive numbers found in the input.

	>>> list(collect_ranges([]))
	[]
	>>> list(collect_ranges([1]))
	[(1, 1)]
	>>> list(collect_ranges([1,2,3]))
	[(1, 3)]

	//
	// cpuse.js - simple continuous cpu monitor for node
	//
	// Intended for programs wanting to monitor and take action on overall CPU load.
	//
	// The monitor starts as soon as you require the module, then you can query it at
	// any later time for the average cpu:
	//
	// > var cpuse = require('cpuse');
	// > cpuse.averages();

	// Check mongodb working set size (Mongo 2.4+).
	// Paste this into mongo console, get back size in GB

	db.runCommand({
	serverStatus:1, workingSet:1, metrics:0, locks:0
	}).workingSet.pagesInMemory * 4096 / (Math.pow(2,30));


	froms = {}
	tos = {}
	for i,line in enumerate(file('refer.2011.csv')):
	try:
	fr, to, count = line.strip().split(',')
	froms[fr] = froms.get(fr,0) + 1
	tos[to] = tos.get(to,0) + 1
	except:
	import traceback; traceback.print_exc()

	import random
	import matplotlib.pyplot as plt

	k = 1000
	array = []
	for n, x in enumerate([range(k)[random.randrange(k)] for x in range(100000)]):
	if n < k:
	array.append(x)
	else:
	if random.random() < k/float(n):