Skip to content

Instantly share code, notes, and snippets.

@fauxparse
Created July 7, 2009 03:30
Show Gist options
  • Save fauxparse/141866 to your computer and use it in GitHub Desktop.
Save fauxparse/141866 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
words = {}
stop_words = File.open("stop.txt") { |file| file.read.split }
def valid?(word)
case word
when /^\@/ then false # reject @usernames
when /^\#/ then true # keep #hashtags (separate from the word 'hashtags')
when /\.+/ then false # reject things that look like URLs
when "http" then false # in case this isn't in your stop-words list
else true
end
end
$stdin.each do |line|
# words:
# * are at least two letters long
# * contain no spaces
# * end with a letter or an apostrophe (like "chillin’")
# * respect smart quotes
# * treat "gandalf's" and "gandalf" as the same word
# * swallow '@' and '#'
line.downcase.scan(/((\b|\@|\#)[a-z][a-z\-\.\'\’]*[a-z\'\’]\b)/) do |word, sep|
word.strip!
word.sub!(/['’]s$/, '')
if valid?(word) && !stop_words.include?(word)
words[word] = (words[word] || 0) + 1
end
end
end
words.reject { |word, count| count < 2 }.sort { |a, b| b.last <=> a.last }.each do |word, count|
puts "%5d %-20s" % [ count, word ]
end
$ ./grab.rb -u fauxparse -p not_my_password | ./analyse.rb
21 just
17 i'm
10 know
9 going
8 good
8 oh
8 i've
8 day
8 like
7 ok
7 man
7 want
7 new
7 play
7 think
7 people
6 check
6 don't
6 ui
6 thought
5 look
5 design
5 time
5 sure
5 really
4 they're
4 say
4 code
4 cast
4 right
4 use
4 got
4 thing
4 i'd
4 work
4 actors
4 post
4 home
4 didn't
3 face
3 using
3 having
3 bought
3 question
3 problem
3 world
3 developers
3 need
3 make
3 decisions
3 itunes
3 answer
3 column
3 i'll
3 kind
3 way
3 car
3 talk
3 phone
3 developer
...SNIP!
#!/usr/bin/env ruby
require "optparse"
require "rubygems"
require "twitter"
params = ARGV.getopts("u:p:", "user:", "password:")
%w(user password).each { |p|; raise ArgumentError, "no #{p} specified" unless (params[p] ||= params[p[0,1]]) }
httpauth = Twitter::HTTPAuth.new(params["user"], params["password"])
twitter = Twitter::Base.new(httpauth)
($stderr << "Grabbing").flush
tweets = (1..10).collect { |i| ($stderr << ".").flush; twitter.user_timeline(:page => i).collect { |t| t.text } }.flatten
$stderr << " #{tweets.size} tweets\n"
puts tweets.join("\n")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment