Skip to content

Instantly share code, notes, and snippets.

@epitron
Last active December 22, 2015 13:28
Show Gist options
  • Save epitron/6479184 to your computer and use it in GitHub Desktop.
Save epitron/6479184 to your computer and use it in GitHub Desktop.
Find tweets that are anagrams of each other.
require 'set'
require 'pp'
require 'date'
require 'pry'
class Index
include Enumerable
def initialize
@tweets = Hash.new { |h,k| h[k] = Set.new }
end
def add(tweet)
@tweets[tweet.anagram_hash].add tweet
end
def each(&block)
@tweets.each(&block)
end
def size
@tweets.size
end
end
class Tweet < Struct.new(:date, :user, :msg)
include Comparable
def <=>(other)
self.msg <=> other.msg
end
def stripped
@stripped ||= msg.downcase.gsub(/[^\w]/, '')
end
def anagram_hash
@anagram_hash ||= stripped.chars.sort.join('')
end
def hash
@hash ||= stripped.hash
end
def eql?(other)
stripped.eql? other.stripped
end
def initialize(line)
if line =~ /\[(\d\d\/\d\d\/\d\d \d\d:\d\d:\d\d)\] <([^>]+)> ?(.*)/
self.user = $2
self.msg = $3
self.date = DateTime.strptime($1, "%m/%d/%y %H:%M:%S")
else
raise "bad line: #{line}"
end
end
end
if $0 == __FILE__
index = Index.new
open("tweets.txt").each do |line|
begin
tweet = Tweet.new(line)
index.add tweet
rescue => e
# p e
end
end
p index.size
matches = index.select{|k,v| v.size > 1 }
binding.pry
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment