Last active
July 6, 2018 14:31
-
-
Save sixtyfive/aa636b7befcded15d8d19736cab8d087 to your computer and use it in GitHub Desktop.
Tiny Ruby/Nokogiri script to extract event data for Halle's Long Night of the Sciences into a more usable form than it is given in on their website
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
require 'nokogiri' | |
require 'csv' | |
CSV.open('LNDW.csv', 'a+') do |csv| | |
Dir.glob('html/*.html') do |filename| | |
puts "Processing #{filename}..." | |
doc = Nokogiri::HTML(File.read(filename)) | |
doc.css('div.the-artist-horizontal').each do |event| | |
host = event.search('div.host span').text | |
title = event.search('.text-slider3 h3').text.strip.gsub(/\n/, '') | |
time = event.search('.text-slider3 h5 strong').text | |
info = event.search('.text-slider3 p') | |
info.search('span').each {|span| span.remove} | |
info = info.text.strip | |
place = info.scan(/(\n.*)/).last.first.strip.gsub(/\s+/, ' ') | |
info = info.gsub(/\n/, '').gsub(/\s+/, ' ') | |
csv << [time, place, host, title, info] | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment