Skip to content

Instantly share code, notes, and snippets.

@ktym
Created October 31, 2012 07:51
Show Gist options
  • Save ktym/3985701 to your computer and use it in GitHub Desktop.
Save ktym/3985701 to your computer and use it in GitHub Desktop.
MIRIAM XML to RDF
#!/usr/bin/env ruby
#
# Convert MIRIAM Registry XML file (http://www.ebi.ac.uk/miriam/main/export/) to RDF
#
# Copyright (C) 2012 Toshiaki Katayama <[email protected]>
#
# Pre requirements:
# % curl http://www.ebi.ac.uk/miriam/main/export/xml/ > miriam.xml
# % gem install nokogiri
# % miriam_xml2rdf.rb miriam.xml > miriam.rdf
#
require 'rubygems'
require 'nokogiri'
require 'erb'
$turtle_backward_compatible = true
xml = Nokogiri::XML(ARGF)
erb_template = DATA.read
ns = xml.namespaces
meta = xml.xpath('//xmlns:miriam', ns).first
def clean(str)
return str.gsub(/\s+/, ' ').strip.gsub('"', '\\"')
end
puts <<HEADER
@prefix : <http://identifiers.org/dataset/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix miriam: <#{ns['xmlns']}> .
<>
dcterms:date "#{meta['date']}" ;
dcterms:hasVersion "#{meta['data-version']}" .
HEADER
xml.xpath('//xmlns:datatype', ns).each do |datatype|
next if datatype["obsolete"] == "true"
path = './/xmlns:uris/xmlns:uri[@type="URN"]'
path2 = './/xmlns:uris/xmlns:uri[@deprecated="true"]'
urns = datatype.xpath(path, ns).collect(&:text) - datatype.xpath(path2, ns).collect(&:text)
path = './/xmlns:uris/xmlns:uri[@type="URL"]'
path2 = './/xmlns:uris/xmlns:uri[@deprecated="true"]'
urls = datatype.xpath(path, ns).collect(&:text) - datatype.xpath(path2, ns).collect(&:text)
namespace = datatype.at('namespace').content
classname = namespace.split(/\.|\-/).map{|x| x.capitalize}.join
if classname[/^\d/] and $turtle_backward_compatible
classname = "DB" + classname
end
rdfs_label = clean( datatype.at('name').content )
rdfs_comment = clean( datatype.at('definition').content )
datatype_id = datatype["id"]
urn = urns.first
url = urls.first
erb = ERB.new(erb_template)
puts erb.result(binding)
end
__END__
:<%= classname %>
rdf:type void:Dataset ;
rdfs:label "<%= rdfs_label %>" ;
rdfs:comment "<%= rdfs_comment %>" ;
miriam:datatype "<%= datatype_id %>" ;
miriam:urn <<%= urn %>> ;
miriam:url <<%= url %>> ;
miriam:namespace "<%= namespace %>" .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment