Ryan Witt ryanwitt

Installing

If you trust me, do this:

curl https://raw.github.com/gist/3868967/30ed8db63ea701e1ad18dabbecfc8df0ffd8b195/install.sh > install.sh

sh install.sh

Doctor referral graph / NPI database full-text indexer

You need 7zip installed to grab the NPI database. (brew install p7zip osx)

To create the index, run the init_* scripts. You would need the doctor graph referral data to use *_refer.*, but the NPI database will be automatically downloaded for you. Indexing happens on all cores, and takes less than 10 min on my 8 core machine.

To grab lines matching a search term, use python search_npi.py term.

Note: index performance is good if you have a lot of memory. Index file blocks will stay hot in cache, but they are loaded each time the program is run, which is super inefficient. Should use an on-disk hashtable where the offsets can be calculated instead.

	import urllib2
	try:
	d = {}
	for line in urllib2.urlopen('http://192.168.201.133:1337/6'):
	split = line.split(',')
	d[split[2]] = d.get(split[2], 0) + 1
	except KeyboardInterrupt:
	for v,k in sorted(((v,k) for k,v in d.items()), reverse=True):
	print k,',',v

	var http = require('http');
	var choices = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam erat augue, molestie accumsan vulputate ut, imperdiet ut nunc. Nunc convallis magna sed dolor suscipit placerat faucibus felis tempus. In in odio arcu, a fringilla tellus. Mauris molestie, nibh non pretium condimentum, lacus mi hendrerit erat, ac ornare dui arcu ut ipsum. Pellentesque luctus venenatis orci et feugiat. Praesent dictum bibendum fermentum. Integer aliquam erat ut dolor semper auctor. Ut sed justo sit amet orci convallis ultrices. Maecenas egestas aliquet diam. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.'.split(' ');
	http.createServer(function (req, res) {
	res.writeHead(200, {'Content-Type': 'text/plain'});
	var no = false;
	console.log(req);
	req.on('close', function() {no=true; res.end();});
	//req.on('end', function() {no=true; res.end();});
	if (req.url == '/') {

	// requires ace editor + https://github.com/creationix/step

	/* INTERACTIVE INTRODUCTION */
	var sec = 600;
	var typewriter_pauses = [
	{match:/[,\-(]/, pause:0.2*sec}
	, {match:/[.!?:]/, pause:0.4*sec}
	, {match:/[\t]/, pause:0.8*sec}
	, {match:/[\n]/, pause:0.4*sec}
	, {match:/./, pause:0.035sec}

	import random
	import matplotlib.pyplot as plt

	k = 1000
	array = []
	for n, x in enumerate([range(k)[random.randrange(k)] for x in range(100000)]):
	if n < k:
	array.append(x)
	else:
	if random.random() < k/float(n):


	froms = {}
	tos = {}
	for i,line in enumerate(file('refer.2011.csv')):
	try:
	fr, to, count = line.strip().split(',')
	froms[fr] = froms.get(fr,0) + 1
	tos[to] = tos.get(to,0) + 1
	except:
	import traceback; traceback.print_exc()

	// Check mongodb working set size (Mongo 2.4+).
	// Paste this into mongo console, get back size in GB

	db.runCommand({
	serverStatus:1, workingSet:1, metrics:0, locks:0
	}).workingSet.pagesInMemory * 4096 / (Math.pow(2,30));

	//
	// cpuse.js - simple continuous cpu monitor for node
	//
	// Intended for programs wanting to monitor and take action on overall CPU load.
	//
	// The monitor starts as soon as you require the module, then you can query it at
	// any later time for the average cpu:
	//
	// > var cpuse = require('cpuse');
	// > cpuse.averages();