Skip to content

Instantly share code, notes, and snippets.

@yswallow
Created February 4, 2016 13:03
Show Gist options
  • Select an option

  • Save yswallow/1fdab9859bf1181a930d to your computer and use it in GitHub Desktop.

Select an option

Save yswallow/1fdab9859bf1181a930d to your computer and use it in GitHub Desktop.
ツイートのアーカイブからツイートの長さの中央値を求める
require 'json'
def entities_length(tweet)
indices = []
tweet['entities']['user_mentions'].each do |item|
indices << item['indices']
end
tweet['entities']['urls'].each do |item|
indices << item['indices']
end
return indices.map { |i,j| j-i }.inject(&:+) || 0
end
lengthes = [0] * 141 # lengthes[0] ~ lengthes[140]
Dir.glob('./tweets/*.js') do |path|
#Dir.glob('./tweets/2015_06.js') do |path|
puts path
open(path) do |io|
io.gets
tweets = JSON.parse(io.read)
tweets.each do |tweet|
next if tweet['retweeted_status']
# next unless tweet['entities']['urls'].empty?
lengthes[ tweet['text'].length - entities_length(tweet) ] += 1
end
end
end
lengthes.each_with_index do |count,length|
print length,': ',count, "\n"
end
10.times { puts }
tweet_count = lengthes.inject(&:+)
half = tweet_count / 2
under_val = 0
lengthes.size.times do |i|
under_val += lengthes[i]
if under_val >= half
puts i
break
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment