Skip to content

Instantly share code, notes, and snippets.

@mjbommar
mjbommar / archiveTwitter.py
Created February 26, 2011 19:49
Archive tweets from a search term going backwards through search.
'''
@author Michael J Bommarito II
@date Feb 26, 2011
@license Simplified BSD, (C) 2011.
This script demonstrates how to use Python to archive historical tweets.
'''
import codecs
import csv
@mjbommar
mjbommar / plotMarch11.r
Created March 12, 2011 23:14
Plot the #march11/#saudi tweets.
#@author Michael J Bommarito II
#@date Mar 12, 2011
# Thanks Hadley!
library(ggplot2)
# Load the sample
twitterDF <- read.csv('sample.csv', header=FALSE, stringsAsFactors=FALSE)
# Convert Twitter dates to POSIXct objects
@mjbommar
mjbommar / buildCodeIndex.java
Created April 10, 2011 21:17
Build a Lucene Index from a U.S. Code XHTML ZIP file.
/**
* @author Michael J Bommarito II
* @date Apr 9, 2011
* @license MIT, (C) Michael J Bommarito II 2011
*/
package org.mjb;
// Java standard library imports
import java.io.*;
import java.util.*;
@mjbommar
mjbommar / searchCodeIndex.java
Created April 10, 2011 21:21
Search a Lucene index of the U.S. Code built from XHTML.
/**
* @author Michael J Bommarito II
* @date Apr 9, 2011
* @license MIT, (C) Michael J Bommarito II 2011
*/
package org.mjb;
// Java standard library imports
import java.io.*;
@mjbommar
mjbommar / pom.xml
Created April 10, 2011 23:38
pom.xml for Building a better legal search engine, part 1
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.mjb</groupId>
<artifactId>uscs</artifactId>
<version>0.1</version>
<packaging>jar</packaging>
<name>uscs</name>
@mjbommar
mjbommar / statecode_mi.xsl
Created August 14, 2011 16:50
XSL stylesheet for Michigan Compiled Law XML
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/code">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Michigan Compiled Law</title>
<style type="text/css">
.chapter {
# Clear and load libraries
rm(list=ls())
library(ggplot2)
library(stats)
# Read data
data <- read.csv("bills-all.csv", comment.char='')
# Plot and save.
ggplot(data) +
@mjbommar
mjbommar / repr_snippet.py
Last active October 3, 2015 07:17
Introspective Python __repr__ like Java auto-gen toString() with dir() and eval()
def __repr__(self):
'''
Return string representation.
'''
skipNone = True
reprString = type(self).__name__ + " ["
elements = dir(self)
for e in elements:
# Make sure we only display "public" fields; skip anything private (_*), that is a method/function, or that is a module.
@mjbommar
mjbommar / generateSDF_RFC822.py
Created April 21, 2012 12:35
Generate AWS CloudSearch SDF from RFC822 email messages: Enron sample
import codecs
import email
import email.parser
import glob
import json
import os
import os.path
import sys
def parseFile(fileName):
@mjbommar
mjbommar / plotHashtag2.R
Created May 21, 2012 12:59
Plot a time series of a hashtag where height is tweet count and color is unique user count.
# @author: Bommarito Consulting, LLC; http://michaelbommarito.com/
# @date: May 21, 2012
# @email: [email protected]
# @packages: ggplot2, plyr
# Clear and import.
rm(list=ls())
library(ggplot2)
library(plyr)