Skip to content

Instantly share code, notes, and snippets.

@terrycojones
Created December 3, 2011 15:38
Show Gist options
  • Save terrycojones/1427397 to your computer and use it in GitHub Desktop.
Save terrycojones/1427397 to your computer and use it in GitHub Desktop.
def collectURLs(tweets):
"""
Extract all mentioned URLs from a set of tweets.
@param tweets: A C{dict} of tweets, as returned by L{getHistoricalTweets}.
@return: A C{dict} with mentioned URL keys and values a time-sorted list
of tweet URLs.
"""
URLs = defaultdict(list)
for id, tweet in tweets.iteritems():
for mentionedURL in tweet.entities['urls']:
expandedURL = mentionedURL.get('expanded_url')
if expandedURL:
URLs[expandedURL].append(
(id, 'http://twitter.com/#!/%s/status/%d' % (
tweet.user.screen_name, id)))
# Sort each list of tweets of each URL by tweet id (i.e., in order of
# tweeting).
sortedURLs = {}
idGetter = itemgetter(0)
URLGetter = itemgetter(1)
for URL, mentions in URLs.iteritems():
sortedURLs[URL] = map(URLGetter, sorted(mentions, key=idGetter, reverse=True))
return sortedURLs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment