Skip to content

Instantly share code, notes, and snippets.

@pmgreen
Created August 4, 2020 19:17
Show Gist options
  • Save pmgreen/fc1c5dfe215602a63d98295ac4518e15 to your computer and use it in GitHub Desktop.
Save pmgreen/fc1c5dfe215602a63d98295ac4518e15 to your computer and use it in GitHub Desktop.
# frozen_string_literal: true
require 'nokogiri'
require 'csv'
# Get Alma publishing job data enhancement schema.
# Download data enrichment html and point this script to it.
# TODO: clean this up and refine it, if needed again
output = CSV.open('publishing_job_text.txt', 'w')
doc = File.open('Publishing Profile Details.html') { |f| Nokogiri::HTML(f) }
data = doc.search('form').map do |form|
profile_name_label = form.xpath('./div[3]/div/div[3]/div/div[2]/div/div[1]/div/label/span')
profile_name = profile_name_label.xpath('../../div').text
profileid_label = form.xpath('./div[3]/div/div[3]/div/div[2]/div/div[2]/div/label/span[1]')
profileid = profileid_label.xpath("../../span[@id='pageBeanpublishingProfileid']").text
next unless profile_name_label.text.chomp(' ') != ''
output << [
profile_name_label.text.chomp(' '),
profile_name.strip
]
output << [
profileid_label.text,
profileid
]
form.xpath(".//div[@class='marSidesAuto panel panel-default section ']").each do |section|
title = section.xpath(".//span[@class='sectionTitle']")
output << ['-' * 35]
output << [title.text.upcase]
output << ['-' * 35]
section.xpath('.//label/span').each do |label|
value = label.xpath('../../div/input/@value').text
label = label.text.to_s
value = value
next unless label.strip.chomp(' ') != ''
output << [
label.strip.chomp(' '),
value.strip.chomp
]
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment