neilkod’s gists

neilkod / conconrdance_output.txt

Created October 21, 2010 13:29

output of concordance.py

	hadoop4:nltk-zoolander nkodner$ ./concordance.py
	concordance for mugatu.....
	Building index...
	Displaying 25 of 25 matches:
	ar The Malaysian must be eliminated mugatu What No I dont have time for this P
	t them clawing their faces for more mugatu sucks Support the prime minister Mu
	tu sucks Support the prime minister mugatu uses slave labor Down with Mugatu Y
	r Mugatu uses slave labor Down with mugatu You hate to see something like that
	ul people Theres no denying Jacobim mugatu has used cheap Malaysian workers to
	ts newsstands tomorrow Excuse me Mr mugatu Mr Mugatu Matilda Jeffries Time mag

neilkod / conconrdance_output.txt

Created October 21, 2010 13:29

output of concordance.py

	hadoop4:nltk-zoolander nkodner$ ./concordance.py
	concordance for mugatu.....
	Building index...
	Displaying 25 of 25 matches:
	ar The Malaysian must be eliminated mugatu What No I dont have time for this P
	t them clawing their faces for more mugatu sucks Support the prime minister Mu
	tu sucks Support the prime minister mugatu uses slave labor Down with Mugatu Y
	r Mugatu uses slave labor Down with mugatu You hate to see something like that
	ul people Theres no denying Jacobim mugatu has used cheap Malaysian workers to
	ts newsstands tomorrow Excuse me Mr mugatu Mr Mugatu Matilda Jeffries Time mag

neilkod / mongodb regexp help

Created November 12, 2010 15:59

i'm having trouble getting mongodb to recognize my regular expression. i'm trying to list documents(tweets) where user.screen_name starts with an n

	output

	121980
	--------- tweets from neilkod -------
	neilkod
	neilkod
	neilkod
	neilkod
	neilkod
	---------tweets from users that start with n--------

neilkod / gist:674324

Created November 12, 2010 16:40

	response to http://twitter.com/#!/akeem/status/3123965304250368
	> db.tweets.findOne({'user.screen_name' : /TrishCarey/});
	{
	"_id" : ObjectId("4cdd49080e37707a39caa92c"),
	"retweeted_status" : {
	"geo" : null,
	"retweet_count" : null,
	"in_reply_to_status_id" : null,
	"text" : "Last chance to get #RavenHunt poker coin at #PubCon today: I'll be outside Salon C at 4:10. Clue No. 1 to follow again on Twitter.",
	"entities" : {

neilkod / gist:674333

Created November 12, 2010 16:47

	> db.tweets.findOne({'user.screen_name' : '/neilkod/'});
	null
	> db.tweets.findOne({'user.screen_name' : 'neilkod'});
	{
	"_id" : ObjectId("4cdd490a0e37707a39cabac1"),
	"contributors" : null,
	"in_reply_to_screen_name" : "MerryMorud",
	"text" : "@MerryMorud you're climbing the leaderboard of #pubcon tweeters http://www.pubcontweets.com - can you dethrone @steveplunkett ?",
	"entities" : {
	"urls" : [

neilkod / get_stopwords.py

Created November 30, 2010 21:09

given a file of stopwords, one word on each line, return a list containing all of the words.

	def get_stopwords(file='stopwords.txt'):
	""" given a file, default stopwords.txt, returns a list containing all of the words
	in the file """
	stopwords = []
	words = open(file,'r')
	for word in words:
	stopwords.append(word.strip())
	return stopwords

neilkod / google_social_api_twitter_connections.py

Created December 13, 2010 01:43

uses google social api to build lists of twitter friends/followers

	import urllib2
	import simplejson as json

	def google_social_api_friends(screen_name):
	""" given a twitter screen name, returns a list of all of their twitter
	friends, using the google social API"""

	url = "http://socialgraph.apis.google.com/lookup?q=http://www.twitter.com/%s&edo=1&callback=" % (screen_name)
	# parameters edo = 1 means show outbound links. edi=1 means show inbound
	fetched = urllib2.urlopen(url).read()

neilkod / gist:743672

Created December 16, 2010 17:13

	def get_retweets(hashtag):
	db = create_connection()
	tweets = db.conftweets
	regexp = re.compile(hashtag, re.IGNORECASE)
	grpd = tweets.group( key = {'retweeted_status.id': True},
	condition = {'entities.hashtags.text': regexp,'retweeted_status.id': {'$ne': None}},
	initial = {'count': 0},
	reduce = 'function(doc, prev) {prev.count += 1}')
	return grpd

neilkod / grouped, top-n query in MongoDB.py

Created December 16, 2010 17:15

Returning the ids of the top-10 most retweeted tweets

	def get_retweets(hashtag):
	db = create_connection()
	tweets = db.conftweets
	regexp = re.compile(hashtag, re.IGNORECASE)
	grpd = tweets.group( key = {'retweeted_status.id': True},
	condition = {'entities.hashtags.text': regexp,
	'retweeted_status.id': {'$ne': None}},
	initial = {'count': 0},
	reduce = 'function(doc, prev) {prev.count += 1}')
	return grpd

neilkod / ec2_region_from_ip.py

Created December 31, 2010 13:29

given an IP address, return the EC2 region it originates from

	# requires IPy https://github.com/haypo/python-ipy

	from IPy import IP

	# putting EC2_REGIONS here so the map only has to be performed once.
	# I'm not sure if this is a great idea. Comments?
	# ideally this list will be scraped from the following source
	# https://forums.aws.amazon.com/ann.jspa?annID=857
	# thanks to serverfault.com user Flashman for pointing me to this doc.

neil kodner neilkod