This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
KEYWORD | |
------------------------------ | |
!= | |
; | |
<= | |
=> | |
>= | |
ABS | |
ACOS | |
ACTIVE_COMPONENT |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# requires IPy https://github.com/haypo/python-ipy | |
from IPy import IP | |
# putting EC2_REGIONS here so the map only has to be performed once. | |
# I'm not sure if this is a great idea. Comments? | |
# ideally this list will be scraped from the following source | |
# https://forums.aws.amazon.com/ann.jspa?annID=857 | |
# thanks to serverfault.com user Flashman for pointing me to this doc. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_retweets(hashtag): | |
db = create_connection() | |
tweets = db.conftweets | |
regexp = re.compile(hashtag, re.IGNORECASE) | |
grpd = tweets.group( key = {'retweeted_status.id': True}, | |
condition = {'entities.hashtags.text': regexp, | |
'retweeted_status.id': {'$ne': None}}, | |
initial = {'count': 0}, | |
reduce = 'function(doc, prev) {prev.count += 1}') | |
return grpd |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_retweets(hashtag): | |
db = create_connection() | |
tweets = db.conftweets | |
regexp = re.compile(hashtag, re.IGNORECASE) | |
grpd = tweets.group( key = {'retweeted_status.id': True}, | |
condition = {'entities.hashtags.text': regexp,'retweeted_status.id': {'$ne': None}}, | |
initial = {'count': 0}, | |
reduce = 'function(doc, prev) {prev.count += 1}') | |
return grpd |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_stopwords(file='stopwords.txt'): | |
""" given a file, default stopwords.txt, returns a list containing all of the words | |
in the file """ | |
stopwords = [] | |
words = open(file,'r') | |
for word in words: | |
stopwords.append(word.strip()) | |
return stopwords |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> db.tweets.findOne({'user.screen_name' : '/neilkod/'}); | |
null | |
> db.tweets.findOne({'user.screen_name' : 'neilkod'}); | |
{ | |
"_id" : ObjectId("4cdd490a0e37707a39cabac1"), | |
"contributors" : null, | |
"in_reply_to_screen_name" : "MerryMorud", | |
"text" : "@MerryMorud you're climbing the leaderboard of #pubcon tweeters http://www.pubcontweets.com - can you dethrone @steveplunkett ?", | |
"entities" : { | |
"urls" : [ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
response to http://twitter.com/#!/akeem/status/3123965304250368 | |
> db.tweets.findOne({'user.screen_name' : /TrishCarey/}); | |
{ | |
"_id" : ObjectId("4cdd49080e37707a39caa92c"), | |
"retweeted_status" : { | |
"geo" : null, | |
"retweet_count" : null, | |
"in_reply_to_status_id" : null, | |
"text" : "Last chance to get #RavenHunt poker coin at #PubCon today: I'll be outside Salon C at 4:10. Clue No. 1 to follow again on Twitter.", | |
"entities" : { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
output | |
121980 | |
--------- tweets from neilkod ------- | |
neilkod | |
neilkod | |
neilkod | |
neilkod | |
neilkod | |
---------tweets from users that start with n-------- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hadoop4:nltk-zoolander nkodner$ ./concordance.py | |
concordance for mugatu..... | |
Building index... | |
Displaying 25 of 25 matches: | |
ar The Malaysian must be eliminated mugatu What No I dont have time for this P | |
t them clawing their faces for more mugatu sucks Support the prime minister Mu | |
tu sucks Support the prime minister mugatu uses slave labor Down with Mugatu Y | |
r Mugatu uses slave labor Down with mugatu You hate to see something like that | |
ul people Theres no denying Jacobim mugatu has used cheap Malaysian workers to | |
ts newsstands tomorrow Excuse me Mr mugatu Mr Mugatu Matilda Jeffries Time mag |