Skip to content

Instantly share code, notes, and snippets.

View neilkod's full-sized avatar

neil kodner neilkod

View GitHub Profile
@neilkod
neilkod / conconrdance_output.txt
Created October 21, 2010 13:29
output of concordance.py
hadoop4:nltk-zoolander nkodner$ ./concordance.py
concordance for mugatu.....
Building index...
Displaying 25 of 25 matches:
ar The Malaysian must be eliminated mugatu What No I dont have time for this P
t them clawing their faces for more mugatu sucks Support the prime minister Mu
tu sucks Support the prime minister mugatu uses slave labor Down with Mugatu Y
r Mugatu uses slave labor Down with mugatu You hate to see something like that
ul people Theres no denying Jacobim mugatu has used cheap Malaysian workers to
ts newsstands tomorrow Excuse me Mr mugatu Mr Mugatu Matilda Jeffries Time mag
@neilkod
neilkod / conconrdance_output.txt
Created October 21, 2010 13:29
output of concordance.py
hadoop4:nltk-zoolander nkodner$ ./concordance.py
concordance for mugatu.....
Building index...
Displaying 25 of 25 matches:
ar The Malaysian must be eliminated mugatu What No I dont have time for this P
t them clawing their faces for more mugatu sucks Support the prime minister Mu
tu sucks Support the prime minister mugatu uses slave labor Down with Mugatu Y
r Mugatu uses slave labor Down with mugatu You hate to see something like that
ul people Theres no denying Jacobim mugatu has used cheap Malaysian workers to
ts newsstands tomorrow Excuse me Mr mugatu Mr Mugatu Matilda Jeffries Time mag
@neilkod
neilkod / mongodb regexp help
Created November 12, 2010 15:59
i'm having trouble getting mongodb to recognize my regular expression. i'm trying to list documents(tweets) where user.screen_name starts with an n
output
121980
--------- tweets from neilkod -------
neilkod
neilkod
neilkod
neilkod
neilkod
---------tweets from users that start with n--------
response to http://twitter.com/#!/akeem/status/3123965304250368
> db.tweets.findOne({'user.screen_name' : /TrishCarey/});
{
"_id" : ObjectId("4cdd49080e37707a39caa92c"),
"retweeted_status" : {
"geo" : null,
"retweet_count" : null,
"in_reply_to_status_id" : null,
"text" : "Last chance to get #RavenHunt poker coin at #PubCon today: I'll be outside Salon C at 4:10. Clue No. 1 to follow again on Twitter.",
"entities" : {
> db.tweets.findOne({'user.screen_name' : '/neilkod/'});
null
> db.tweets.findOne({'user.screen_name' : 'neilkod'});
{
"_id" : ObjectId("4cdd490a0e37707a39cabac1"),
"contributors" : null,
"in_reply_to_screen_name" : "MerryMorud",
"text" : "@MerryMorud you're climbing the leaderboard of #pubcon tweeters http://www.pubcontweets.com - can you dethrone @steveplunkett ?",
"entities" : {
"urls" : [
@neilkod
neilkod / get_stopwords.py
Created November 30, 2010 21:09
given a file of stopwords, one word on each line, return a list containing all of the words.
def get_stopwords(file='stopwords.txt'):
""" given a file, default stopwords.txt, returns a list containing all of the words
in the file """
stopwords = []
words = open(file,'r')
for word in words:
stopwords.append(word.strip())
return stopwords
@neilkod
neilkod / google_social_api_twitter_connections.py
Created December 13, 2010 01:43
uses google social api to build lists of twitter friends/followers
import urllib2
import simplejson as json
def google_social_api_friends(screen_name):
""" given a twitter screen name, returns a list of all of their twitter
friends, using the google social API"""
url = "http://socialgraph.apis.google.com/lookup?q=http://www.twitter.com/%s&edo=1&callback=" % (screen_name)
# parameters edo = 1 means show outbound links. edi=1 means show inbound
fetched = urllib2.urlopen(url).read()
def get_retweets(hashtag):
db = create_connection()
tweets = db.conftweets
regexp = re.compile(hashtag, re.IGNORECASE)
grpd = tweets.group( key = {'retweeted_status.id': True},
condition = {'entities.hashtags.text': regexp,'retweeted_status.id': {'$ne': None}},
initial = {'count': 0},
reduce = 'function(doc, prev) {prev.count += 1}')
return grpd
@neilkod
neilkod / grouped, top-n query in MongoDB.py
Created December 16, 2010 17:15
Returning the ids of the top-10 most retweeted tweets
def get_retweets(hashtag):
db = create_connection()
tweets = db.conftweets
regexp = re.compile(hashtag, re.IGNORECASE)
grpd = tweets.group( key = {'retweeted_status.id': True},
condition = {'entities.hashtags.text': regexp,
'retweeted_status.id': {'$ne': None}},
initial = {'count': 0},
reduce = 'function(doc, prev) {prev.count += 1}')
return grpd
@neilkod
neilkod / ec2_region_from_ip.py
Created December 31, 2010 13:29
given an IP address, return the EC2 region it originates from
# requires IPy https://github.com/haypo/python-ipy
from IPy import IP
# putting EC2_REGIONS here so the map only has to be performed once.
# I'm not sure if this is a great idea. Comments?
# ideally this list will be scraped from the following source
# https://forums.aws.amazon.com/ann.jspa?annID=857
# thanks to serverfault.com user Flashman for pointing me to this doc.