Skip to content

Instantly share code, notes, and snippets.

@tily
Created February 13, 2012 14:03
Show Gist options
  • Save tily/1817146 to your computer and use it in GitHub Desktop.
Save tily/1817146 to your computer and use it in GitHub Desktop.
Web ページの本文から【相変わらず女の子からも人気の定番アイテム!】みたいな prefix を抜き出すやつ
# coding: utf-8
#!/usr/bin/env ruby
require 'open-uri'
require 'nokogiri'
# Usage: ruby extract_prefix.rb http://shop.menz-style.com/
# ruby extract_prefix.rb http://hayabusa.2ch.net/news4vip/subback.html
def get_prefix_list(url)
list = []
doc = Nokogiri::HTML.parse(open(url).read)
doc.text.scan(/【.+?】/u).each do |text|
list << text
end
list.uniq
end
puts get_prefix_list(ARGV[0])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment