Alex Storer alexstorer

Stanford Graduate School of Business
San Francisco Bay Area
http://www.stanford.edu/~astorer

Recently created

Least recently created

Recently updated

Least recently updated

alexstorer / gist:3455295

Created August 24, 2012 20:30

For PyLucene! # This file demonstrates how to... # 1) Make a new analyzer # 2) Make a new filter # 3) Apply an analyzer chain to a string (via a query) # 4) Include phrases as tokens

	# This file demonstrates how to...
	# 1) Make a new analyzer
	# 2) Make a new filter
	# 3) Apply an analyzer chain to a string (via a query)
	# 4) Include phrases as tokens

	from lucene import *

	class AnalyzerUtils(object):

alexstorer / parsexml.py

Created November 8, 2012 19:21

Patent Processing

	# Use this file by navigating in the terminal to the directory where your file is located and typing:
	# python parsexml.py [name of xml file]
	# You can parse multiple xml files by using the wildcard operator.
	# To process every xml file in a directory, do this:
	# python parsexml.py *.xml

	from lxml import etree
	import re
	import csv
	import os.path as op

alexstorer / parsetext.py

Created November 30, 2012 19:37

How to scrape text for regular expressions

	import re
	import csv
	import glob

	fc = open('signers.csv','w')
	c = csv.DictWriter(fc,["Name","Chamber","Number","Year Filename"])
	c.writeheader()

	# I used the pdftotext utility to convert the pdf documents
	# Look here for details: http://www.bluem.net/en/mac/packages/

alexstorer / example.py

Created December 5, 2012 21:53

Parsing Text Files

	import glob
	import re
	import csv

	allfiles = glob.glob('1972_1998_batch/*.sim')

	fw = open('psb.csv','w')
	dw = csv.DictWriter(fw,["filename","month","electricity","gas"])
	dw.writeheader()
	fwt = open('psb_total.csv','w')

alexstorer / extract_doe.py

Created December 6, 2012 21:35

This script will extract the PS-B and PS-E tables from DOE .sim building files.

	import glob
	import re
	import csv

	'''
	This script will extract the PS-B and PS-E tables from
	DOE .sim building files.

	This script was tested on DOE 2.1e .sim files.

alexstorer / parse.py

Created December 13, 2012 21:48

Parse highlighter files.

	import re

	fname = 'inp.txt'

	colors = ['red','green','blue','orange','yellow']
	colorheaders = ['red is for X','green is for y','blue','orange','yellow']

	colorstrings = []

	for c in colors:

alexstorer / gmap_api.R

Created December 14, 2012 21:33

Download images from the streetview API

	# An R Script to do some scraping

	getSearchString <- function(loc,heading,pitch) {
	s <- paste('http://maps.googleapis.com/maps/api/streetview?size=640x640',
	paste('location=',loc,sep=''),
	paste('heading=',heading,sep=''),
	paste('pitch=',pitch,sep=''),
	'sensor=false',
	sep = '&')
	s <- gsub(' ','%20',s)

alexstorer / sample.R

Created December 17, 2012 21:49

Code to handle the Word documents (saved as HTML) and convert to CSV. Then, we do topic modeling with R.

	setwd('~/Work/dlopez')
	r <- read.csv('Sampl.csv',stringsAsFactors=FALSE)

	# First, let's load up this pile of things.

	Sys.setenv(NOAWT=TRUE)

	# This is a workaround for Macs

	library(tm)

alexstorer / parsexml.py

Created December 21, 2012 16:14

Parse some XML

	from lxml import etree
	import re
	import csv
	import os.path as op
	import sys
	import glob

	class NLRPParse(object):

	def __init__(self,fname):

alexstorer / test.html

Created February 6, 2013 20:56

Basic demonstration of html and javascript.

	<title>Alex's Webpage</title>
	<head>
	<style>
	body
	{
	background-color:#b0c4de;
	}
	h1
	{
	background-color:#123456;

Older Newer