Skip to content

Instantly share code, notes, and snippets.

@ashaw
Created December 4, 2010 17:45
Show Gist options
  • Save ashaw/728351 to your computer and use it in GitHub Desktop.
Save ashaw/728351 to your computer and use it in GitHub Desktop.
$KCODE = "UTF8"
require 'rubygems'
require 'rest_client'
require 'readability'
require 'sanitize'
require 'twitter-text'
require 'crack/json'
require 'curb'
LIST = "propublica/staff"
def read_time(words)
minutes = words / 250
end
def parse_article(url)
html = Readability::Document.new(RestClient.get(url)).content
clean_text = Sanitize.clean(html)
wc = clean_text.split(/ /).size
puts "#{wc} words, read time #{read_time(wc)} minutes"
end
def get_twitter_links
urls = []
list = Crack::JSON.parse(RestClient.get("http://api.twitter.com/1/propublica/lists/staff/statuses.json"))
list.each do |tweet|
text = tweet['text']
tweet_urls = Twitter::Extractor.extract_urls(text)
tweet_urls.flatten
urls << tweet_urls
end
urls = urls.flatten.uniq
urls.each do |url|
url = follow_redirects(url)
end
urls.flatten
end
def follow_redirects(url)
c = Curl::Easy.new(url)
c.follow_location = true
c.max_redirects = nil
c.perform
url = c.last_effective_url
end
##
urls = get_twitter_links
urls.each do |url|
parse_article(url)
end
# ashaw@Al-Shaws-MacBook-Pro timesopen $ ruby reader.rb
# 54 words, read time 0 minutes
# 60 words, read time 0 minutes
# 19 words, read time 0 minutes
# 75 words, read time 0 minutes
# 297 words, read time 1 minutes
# 40 words, read time 0 minutes
# 41 words, read time 0 minutes
# 105 words, read time 0 minutes
# 105 words, read time 0 minutes
# 3800 words, read time 15 minutes
# 1986 words, read time 7 minutes
# 183 words, read time 0 minutes
# 413 words, read time 1 minutes
# 780 words, read time 3 minutes
# 850 words, read time 3 minutes
# 62 words, read time 0 minutes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment