Skip to content

Instantly share code, notes, and snippets.

@kardeiz
Created August 21, 2012 21:42
Show Gist options
  • Save kardeiz/3419688 to your computer and use it in GitHub Desktop.
Save kardeiz/3419688 to your computer and use it in GitHub Desktop.
Convert some Dublin Core XML from DigiTool to CSV with Ruby
#!/usr/bin/env ruby
# encoding: utf-8
require 'active_support/core_ext'
require 'pp'
require 'csv'
require 'nokogiri'
dc_terms = [ :contributor, :coverage, :creator, :date,
:description, :format, :identifier, :language,
:publisher, :relation, :rights, :source,
:subject, :title, :type, :medium ]
my_files = Dir.chdir(ARGV[0]) { Dir.glob("./*").map{|x| File.expand_path(x) } }
csv_string = CSV.generate do |csv|
csv << dc_terms.map(&:to_s)
my_files.each do |x|
my_file = Nokogiri::XML(File.open(x))
if my_file.at_xpath("//md[type='dc']/value")
my_xml = Nokogiri::XML(my_file.at_xpath("//md[type='dc']/value").text)
Hash.from_xml(my_xml.to_xml)["record"].tap do |o|
csv << dc_terms.map{ |dct| [*o[dct.to_s]].join("|") }
end
end
end
end
puts csv_string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment