Skip to content

Instantly share code, notes, and snippets.

View neilkod's full-sized avatar

neil kodner neilkod

View GitHub Profile
@neilkod
neilkod / convert_jmw_canabalt_data.py
Created February 17, 2011 22:08
converts JMW canabalt data into my format
#Basically splits the method of death and the device
#!/bin/python
import sys, re
canabalt_regexp = re.compile(r'"(.*) on my ([^.]+)\."')
for line in sys.stdin:
data = line.strip()
try:
user,score,method_of_death = data.split(',')
@neilkod
neilkod / tweetnationaldebt.py
Created February 16, 2011 23:22
tweets the national debt
import tweepy
from BeautifulSoup import BeautifulSoup
import urllib2
CONSUMER_KEY = 'oh'
CONSUMER_SECRET = 'no'
ACCESS_KEY = 'you'
ACCESS_SECRET = 'dont'
@neilkod
neilkod / random joel kodner tweets
Created February 12, 2011 13:38
using a corpus of 200 tweets, generate random tweets using markov models
building ngram index...
@karen_sharp @allisonnazarian By being concerned with nothing but
ourselves, we need not tyranny to trap us. Like involuntary shitting.
RT @EllieM72: Crying doesn't indicate that you're alive Twitter in
America: "FOOOOOOOODDDD!!!" Christian Pedophiles And Conservatives
#CPAC No, in the immediacy of a wink, where I pretend to resist & y
... Anything available at a Chinese buffet, comedian John Pinette
brushes up his "You go now!" routine with an Egyptian accent. RT
@FLGovRickScott: 2 Live Crew's Luther Campbell to run for Miami-Dade
@neilkod
neilkod / rman_stats.sql
Created February 3, 2011 17:54
statistics on most recent rman backups.
column time_taken_display format a12
column input_per_sec format a13
column output_per_sec format a13
column input_gb format 9999
column output_gb format 9999
column status format a10
set lines 300 pages 300
select start_time
, end_time
, input_bytes_per_sec_display input_per_sec
@neilkod
neilkod / nationaldebt.py
Created January 31, 2011 02:04
retrieves and prints the national debt and us population
from BeautifulSoup import BeautifulSoup
import urllib2
page = urllib2.urlopen('http://www.treasurydirect.gov/NP/BPDLogin?application=np')
soup = BeautifulSoup(page)
debt = soup.find('table',{'class':'data1'}).findAll('td')[3].text
asof = soup.find('table',{'class':'data1'}).findAll('td')[0].text
population_url = 'http://www.census.gov/main/www/popclock.html'
population_page = urllib2.urlopen(population_url)
soup = BeautifulSoup(population_page)
@neilkod
neilkod / word count in awk
Created January 24, 2011 03:03
simple word count using awk
Tamara-Kodners-MacBook-Pro:tmp tamara$ cat coins.txt
gold 1 1986 USA American Eagle
gold 1 1908 Austria-Hungary Franz Josef 100 Korona
silver 10 1981 USA ingot
gold 1 1984 Switzerland ingot
gold 1 1979 RSA Krugerrand
gold 0.5 1981 RSA Krugerrand
gold 0.1 1986 PRC Panda
silver 1 1986 USA Liberty dollar
gold 0.25 1986 USA Liberty 5-dollar piece
@neilkod
neilkod / gist:789727
Created January 21, 2011 14:23
results of running against 500 urls before my IP was blocked from delicious
>>> for x,y in srtd[0:30]:
... print x,y
...
design 139
art 105
fun 101
inspiration 100
blog 72
photography 69
@neilkod
neilkod / delicious hacking
Created January 21, 2011 14:15
what topics are associated with cool?
results are on the last line.
I'm trying to determine which subjects on delicious are 'cool'. I downloaded a sample of 1700 urls that were tagged at least once with 'cool'.
I ran the code against a small sample of 50 because I have to look up delicious url metadata(bookmarks, tags) for each url and I'm rate-limited. Out of 50 urls(hey, i'm just testing), 94 were bookmarked with 'cool'. Remember, 1:M relationship between urls:bookmarks.
Looking at those 94 records, I'm counting the other hashtags that were used in conjunction with cool.
Results follow. I'm now running it against 500 urls just for some testing. If this works, I'll run it against as many URLS as I can as I progress. I still have a bit to learn about the deliciousapi module, I haven't looked through its source yet.
@neilkod
neilkod / last_backup.sql
Created January 19, 2011 18:51
completion time and scn for last successful full backup
select max(end_time) completion_time
, timestamp_to_scn(max(end_time)) scn
from v$rman_backup_job_details
where input_type = 'DB FULL'
and status='COMPLETED';
@neilkod
neilkod / emr configuration
Created January 19, 2011 13:53
from the yelp mrjob default configuration file
aws_access_key_id: HADOOPHADOOPBOBADOOP
aws_region: us-west-1
aws_secret_access_key: MEMIMOMADOOPBANANAFANAFOFADOOPHADOOP