Skip to content

Instantly share code, notes, and snippets.

@ejamesc
Created April 6, 2012 18:44
Show Gist options
  • Select an option

  • Save ejamesc/2322002 to your computer and use it in GitHub Desktop.

Select an option

Save ejamesc/2322002 to your computer and use it in GitHub Desktop.
Language Detection
require 'rubygems'
require 'mysql2'
require 'unsupervised-language-detection'
db1 = Mysql2::Client.new(:host => 'localhost', :username =>'xxx', :password => 'xxx', :database => 'sgbeat')
db2 = Mysql2::Client.new(:host => 'localhost', :username =>'xxx', :password => 'xxx', :database => 'sgb_pure')
tweets = db1.query("SELECT * FROM tweets").each do |row|
str = row["tweet"]
puts row["id"]
if UnsupervisedLanguageDetection.is_english_tweet?(str)
str = db2.escape(str)
db2.query("INSERT INTO tweets (tweet) VALUES ('#{str}')")
end
end
db1.close
db2.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment