Skip to content

Instantly share code, notes, and snippets.

@pocke
Created November 21, 2016 11:51
Show Gist options
  • Save pocke/0c4565e70ede7f77ca449fd317d81aad to your computer and use it in GitHub Desktop.
Save pocke/0c4565e70ede7f77ca449fd317d81aad to your computer and use it in GitHub Desktop.
沢山ツイートしている単語を知る
$ < p_ck_161121.csv |
ruby -rcgi -rcsv -e '
CSV.parse($<){|row|
puts CGI.unescapeHTML(row[2].gsub(/\@[a-zA-Z0-9_]+/, "").gsub(%r!https?://\S+!, ""))
}' | head -12000 |
docker run --rm -i docker-mecab-neologd:latest |
grep '\s名詞,' | grep -v '語幹' | grep -v '非自立' | grep -v -E '^ー+\s' | cut -f 1 | ruby -pe '$_.downcase!' | sort | grep -v '^.$' |
ruby -e '
c = 1; p = nil;
while gets
if $_.chomp==p then c+=1 else puts "#{c} #{p}"; c=1; p=$_.chomp end
end' |
sort -nr | head -101 | ruby -pale '$_="#{$F[1]} #{$F[0]}"'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment