Skip to content

Instantly share code, notes, and snippets.

@siyo
Created February 26, 2012 15:54
Show Gist options
  • Save siyo/1917449 to your computer and use it in GitHub Desktop.
Save siyo/1917449 to your computer and use it in GitHub Desktop.
TL上の助詞と接続詞を取っ払うやつ(便利じゃない)
# -*- coding: utf-8 -*-
# shoujou filter
#
require 'natto'
Earthquake.init do
output_filter do |item|
next if item.nil? || item["text"].nil?
nm = Natto::MeCab.new
words = nm.parse(item["text"]).split(/\r*\n/)
item["text"] = words.inject(""){|s,e|
a = e.split(/\t/)
s << a[0] unless /(助詞|接続詞)/ =~ a[1] || a[0] == 'EOS'
s
}
true
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment