Last active
March 5, 2023 07:57
-
-
Save milk1000cc/469915480238aa774db9e1765bcb5142 to your computer and use it in GitHub Desktop.
use vessel as an alternative to kimurai: https://github.com/rubycdp/vessel
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'bundler/inline' | |
gemfile do | |
source 'https://rubygems.org' | |
gem 'vessel', github: 'rubycdp/vessel' # commit 3097da3daeb2b0f06182b2d4faa7693d82407538 | |
end | |
class FirstMiddleware < Vessel::Middleware | |
def call(item, _) | |
item[:h1] += '!' | |
puts item[:h1] | |
item | |
end | |
end | |
class SecondMiddleware < Vessel::Middleware | |
def call(item, _) | |
puts item[:h1] + '!' | |
item | |
end | |
end | |
class ApplicationCrawler < Vessel::Cargo | |
delay 1 | |
threads max: 1 | |
middleware 'FirstMiddleware', 'SecondMiddleware' | |
end | |
class ExampleCrawler < ApplicationCrawler | |
start_urls 'https://example.com/' | |
def parse | |
yield({ h1: at_css('h1').text }) | |
end | |
def parse2 | |
p data | |
end | |
end | |
# run (kimurai: crawl!) | |
ExampleCrawler.run | |
# parse (kimurai: parse!) | |
engine = Vessel::Engine.new(ExampleCrawler) | |
engine.parse 'https://example.com/2', :parse2, { foo: :bar } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment