Skip to content

Instantly share code, notes, and snippets.

View alexstorer's full-sized avatar

Alex Storer alexstorer

View GitHub Profile
@alexstorer
alexstorer / gist:3455295
Created August 24, 2012 20:30
For PyLucene! # This file demonstrates how to... # 1) Make a new analyzer # 2) Make a new filter # 3) Apply an analyzer chain to a string (via a query) # 4) Include phrases as tokens
# This file demonstrates how to...
# 1) Make a new analyzer
# 2) Make a new filter
# 3) Apply an analyzer chain to a string (via a query)
# 4) Include phrases as tokens
from lucene import *
class AnalyzerUtils(object):
@alexstorer
alexstorer / parsexml.py
Created November 8, 2012 19:21
Patent Processing
# Use this file by navigating in the terminal to the directory where your file is located and typing:
# python parsexml.py [name of xml file]
# You can parse multiple xml files by using the wildcard operator.
# To process every xml file in a directory, do this:
# python parsexml.py *.xml
from lxml import etree
import re
import csv
import os.path as op
@alexstorer
alexstorer / parsetext.py
Created November 30, 2012 19:37
How to scrape text for regular expressions
import re
import csv
import glob
fc = open('signers.csv','w')
c = csv.DictWriter(fc,["Name","Chamber","Number","Year Filename"])
c.writeheader()
# I used the pdftotext utility to convert the pdf documents
# Look here for details: http://www.bluem.net/en/mac/packages/
@alexstorer
alexstorer / example.py
Created December 5, 2012 21:53
Parsing Text Files
import glob
import re
import csv
allfiles = glob.glob('1972_1998_batch/*.sim')
fw = open('psb.csv','w')
dw = csv.DictWriter(fw,["filename","month","electricity","gas"])
dw.writeheader()
fwt = open('psb_total.csv','w')
@alexstorer
alexstorer / extract_doe.py
Created December 6, 2012 21:35
This script will extract the PS-B and PS-E tables from DOE .sim building files.
import glob
import re
import csv
'''
This script will extract the PS-B and PS-E tables from
DOE .sim building files.
This script was tested on DOE 2.1e .sim files.
@alexstorer
alexstorer / parse.py
Created December 13, 2012 21:48
Parse highlighter files.
import re
fname = 'inp.txt'
colors = ['red','green','blue','orange','yellow']
colorheaders = ['red is for X','green is for y','blue','orange','yellow']
colorstrings = []
for c in colors:
@alexstorer
alexstorer / gmap_api.R
Created December 14, 2012 21:33
Download images from the streetview API
# An R Script to do some scraping
getSearchString <- function(loc,heading,pitch) {
s <- paste('http://maps.googleapis.com/maps/api/streetview?size=640x640',
paste('location=',loc,sep=''),
paste('heading=',heading,sep=''),
paste('pitch=',pitch,sep=''),
'sensor=false',
sep = '&')
s <- gsub(' ','%20',s)
@alexstorer
alexstorer / sample.R
Created December 17, 2012 21:49
Code to handle the Word documents (saved as HTML) and convert to CSV. Then, we do topic modeling with R.
setwd('~/Work/dlopez')
r <- read.csv('Sampl.csv',stringsAsFactors=FALSE)
# First, let's load up this pile of things.
Sys.setenv(NOAWT=TRUE)
# This is a workaround for Macs
library(tm)
@alexstorer
alexstorer / parsexml.py
Created December 21, 2012 16:14
Parse some XML
from lxml import etree
import re
import csv
import os.path as op
import sys
import glob
class NLRPParse(object):
def __init__(self,fname):
@alexstorer
alexstorer / test.html
Created February 6, 2013 20:56
Basic demonstration of html and javascript.
<title>Alex's Webpage</title>
<head>
<style>
body
{
background-color:#b0c4de;
}
h1
{
background-color:#123456;