Skip to content

Instantly share code, notes, and snippets.

@iansrc0811
Last active June 25, 2019 07:53
Show Gist options
  • Save iansrc0811/fcf62a3640537c21afac2a37c24d7ba5 to your computer and use it in GitHub Desktop.
Save iansrc0811/fcf62a3640537c21afac2a37c24d7ba5 to your computer and use it in GitHub Desktop.
爬博客來書名的爬蟲
require 'rubygems'
require 'nokogiri'
require 'mechanize'
require 'open-uri'
class Books_Crawler
def get_book_names(book_name)
book_item = self.start_crawler(book_name)
book_names = []
book_item.each do |name|
book_names.push(name['title'])
end
return book_names
end
def get_book_links(book_name)
book_item = self.start_crawler(book_name)
book_links = []
book_item.each do |name|
book_links.push(name['href'])
end
return book_links
end
def start_crawler(book_name)
agent = Mechanize.new
books_url = "http://www.books.com.tw/"
page = agent.get(books_url)
books_form = page.form('search')
books_form.key = book_name
page = agent.submit(books_form)
current_url = page.uri.to_s
page = Nokogiri::HTML(open(current_url))
book_list = []
book_item = page.xpath("//li[@class='item']/a[@rel='mid_image']")
return book_item
end
end
@iansrc0811
Copy link
Author

iansrc0811 commented Jan 9, 2017

模擬在博客來(http://www.books.com.tw/) 搜尋的結果,需要給一個搜尋字串,爬蟲回傳搜尋結果的第一頁的書名或書的連結。此例為搜尋"ruby"並獲取書名

Usage :

  • 安裝gem: nokogiri 和mechanize
  • 進入irb:

$ irb
2.2.1 :001 > require './books_crawler.rb'
=> true
2.2.1 :003 > Books_Crawler.new.get_book_names("ruby")
=> ["職業駭客的告白II部曲:Python和Ruby啓發式程式語言的秘密", "10天學會 Ruby on Rails:Web 2.0 網站架設速成(暢銷回饋版)", "Ruby on Rails 自習手冊:邁向鐵道工人之路", "Ruby物件導向設計實踐:敏捷入門", "還在寫PHP?大師才用輕量級Ruby、JavaScript開發Web", "Effective Ruby中文版:寫出良好Ruby程式的48個具體做法", "當SketchUp遇見Ruby:邁向程式化建模之路", "Ruby 學習手冊", "10天學會 Ruby on Rails:Web 2.0 網站架設速成", "Ruby 程式設計", "Ruby程式設計密技268", "Ruby錦囊妙技", "Ruby on Rails網路應用程式開發與建置(附光碟)", "Ruby on Rails:建置與執行", "偷窺公關女王的人脈筆記 終極版活用寶典(兩冊合購不分售)", "【Afternoon Tea】17’年A5原創機能行事曆 RUBY藍", "What’s the Big Deal About First Ladies?", "The Baker’s Tale: Ruby Spriggs and the Legacy of Charles Dickens", "Pokemon Omega Ruby Alpha Sapphire 2", "Voulkos: The Breakthrough Years"]
2.2.1 :004 >

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment