Skip to content

Instantly share code, notes, and snippets.

@acro5piano
Created February 10, 2022 03:46
Show Gist options
  • Save acro5piano/f9b3849d1903ca5020c63c1b9308beb6 to your computer and use it in GitHub Desktop.
Save acro5piano/f9b3849d1903ca5020c63c1b9308beb6 to your computer and use it in GitHub Desktop.
simple scraping in terminal

Now, you can scrape website with simple pipe.

$ curl -Ss 'https://moncargo.io' | scrape.rb meta

<meta name="viewport" content="width=device-width">
<meta charset="utf-8">
<meta name="twitter:title" content="MonCargo - Track your cargo online">
<meta property="og:title" content="MonCargo - Track your cargo online">
<meta name="description" content="MonCargo monitors container shipment status and sends email notifications if a schedule changes.">
<meta name="twitter:card" content="summary_large_image">
<meta property="og:image" content="https://user-images.githubusercontent.com/10719495/132133995-134cd804-bda4-4ae2-8135-04d1e8425606.png">
<meta name="msapplication-TileColor" content="#da532c">
<meta name="theme-color" content="#ffffff">
<meta name="next-head-count" content="21">
#!/usr/bin/env ruby
require 'nokogiri'
require 'optparse'
config = Hash.new
opts = OptionParser.new
opts.on('-g group') { |v| config[:g] = v }
opts.on('-t target') { |v| config[:t] = v }
opts.on('-u url') { |v| config[:u] = v }
opts.parse!(ARGV)
document = Nokogiri::HTML(STDIN)
if config[:g].nil?
docs = document.css(ARGV[0])
if config[:t].nil?
docs.map { |e| puts e }
elsif config[:t] == 'text'
docs.map { |e| puts e.text }
else
docs.map { |e| puts e.attr(config[:t]) }
end
else
document.css(config[:g]).each do |doc|
puts ARGV.map { |arg| doc.css(arg).text }.join("\t")
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment