Skip to content

Instantly share code, notes, and snippets.

@namnv609
Created October 24, 2018 01:15
Show Gist options
  • Save namnv609/3d747ae8cbc7cc1c35741dcfec6d6d67 to your computer and use it in GitHub Desktop.
Save namnv609/3d747ae8cbc7cc1c35741dcfec6d6d67 to your computer and use it in GitHub Desktop.
InExpress360.com crawler
# encoding: UTF-8
require "nokogiri"
require "open-uri"
require "terminal-table"
require "csv"
table_headings = ["STT", "Mã BC", "Tên Bưu cục", "BC cấp", "Địa chỉ", "Điện thoại"]
province = ARGV[0]
document = Nokogiri::HTML(open("https://inxpress360.com/ma-buu-dien-#{province}/"))
table_rows = []
document.css("article table.tve_table tbody tr").each do |tr_elm|
table_rows << tr_elm.css("td").map{|td_elm| td_elm.content}
end
File.open("#{province}.csv", "w:UTF-8") do |f|
csv_content = CSV.generate do |csv|
csv << table_headings
table_rows.each do |row|
csv << row
end
end
f.write(csv_content)
end
terminal_table = Terminal::Table.new headings: table_headings, rows: table_rows
puts "#{terminal_table}\n"
require "nokogiri"
require "open-uri"
require "terminal-table"
table_headings = %w(No. Province Code)
document = Nokogiri::HTML(open("https://inxpress360.com/ma-buu-dien/"))
table_rows = []
document.css("article .thrv_wrapper table.tve_table tbody tr").each do |tr_elm|
table_rows << tr_elm.css("td").map{|td_elm| td_elm.content}
end
terminal_table = Terminal::Table.new headings: table_headings, rows: table_rows
puts "#{terminal_table.to_s}\n"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment