Skip to content

Instantly share code, notes, and snippets.

@tobynet
Created November 24, 2011 00:31
Show Gist options
  • Select an option

  • Save tobynet/1390341 to your computer and use it in GitHub Desktop.

Select an option

Save tobynet/1390341 to your computer and use it in GitHub Desktop.
ループものの作品名を列挙するやつ for scraperwiki https://scraperwiki.com/scrapers/time-loop-story-lang_ja/
# -*- encoding: utf-8 -*-
%w|rubygems mechanize open-uri|.each{|x| require x}
target = "http://ja.wikipedia.org/wiki/%E3%83%AB%E3%83%BC%E3%83%97%E3%82%82%E3%81%AE"
# scrape
doc = Mechanize.new{|a|a.user_agent_alias = "Windows Mozilla"}.get(target)
# get stories only
stories = doc.search('#bodyContent .mw-content-ltr').text.scan(/『(.+?)』/).map(&:first).uniq
# save to DB
ScraperWiki.save_sqlite(unique_keys=[:name], stories.map{|x| {:name => x} })
puts stories
@tobynet
Copy link
Copy Markdown
Author

tobynet commented Nov 24, 2011

scraperwikiってgistやgithubと連携できないのかな

@tobynet
Copy link
Copy Markdown
Author

tobynet commented Nov 24, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment