Skip to content

Instantly share code, notes, and snippets.

@MagnusEnger
Last active December 29, 2015 10:39
Show Gist options
  • Save MagnusEnger/7658143 to your computer and use it in GitHub Desktop.
Save MagnusEnger/7658143 to your computer and use it in GitHub Desktop.
Converting MARC to RDF, using Catmandu.
#!/usr/bin/perl
use Catmandu::Importer::MARC;
use Catmandu::Exporter::RDF;
use Modern::Perl;
my $records = Catmandu::Importer::MARC->new(
file => 'sample_records.marcxml',
type => 'XML',
);
my $exporter = Catmandu::Exporter::RDF->new(
type => 'Turtle',
fix => 'marc2rdf.fix'
);
$exporter->add_many( $records );
$exporter->commit;
marc_map( '100a','dc:creator' );
marc_map( '245a','dc:title' );
remove_field( 'record' );
prepend("_id","http://example.org/id/");
$ perl catmandu_marc2rdf.pl
<http://example.org/id/960778950> <http://purl.org/dc/elements/1.1/creator> "Wall, Larry" ;
<http://purl.org/dc/elements/1.1/title> "Programming Perl" .
<http://example.org/id/11447981x> <http://purl.org/dc/elements/1.1/creator> "DuCharme, Bob" ;
<http://purl.org/dc/elements/1.1/title> "Learning SPARQL" .
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<leader>01039nam a2200397 a 4500</leader>
<controlfield tag="001">960778950</controlfield>
<controlfield tag="008"> r eng </controlfield>
<datafield tag="020" ind1=" " ind2=" ">
<subfield code="a">1565921496</subfield>
<subfield code="b">h.</subfield>
</datafield>
<datafield tag="060" ind1="k" ind2=" ">
<subfield code="a">QA 76</subfield>
</datafield>
<datafield tag="060" ind1="a" ind2=" ">
<subfield code="a">QA 75.1</subfield>
</datafield>
<datafield tag="080" ind1="u" ind2="k">
<subfield code="a">004.43</subfield>
</datafield>
<datafield tag="080" ind1="u" ind2="b">
<subfield code="a">681.3.04</subfield>
</datafield>
<datafield tag="080" ind1="u" ind2="x">
<subfield code="a">681.3.04</subfield>
</datafield>
<datafield tag="080" ind1="y" ind2="f">
<subfield code="a">681.3.04Perl</subfield>
</datafield>
<datafield tag="080" ind1="a" ind2=" ">
<subfield code="a">681.3.04PERL</subfield>
</datafield>
<datafield tag="082" ind1="u" ind2="p">
<subfield code="a">005.43</subfield>
</datafield>
<datafield tag="082" ind1="d" ind2=" ">
<subfield code="a">005.133</subfield>
</datafield>
<datafield tag="082" ind1="u" ind2="a">
<subfield code="a">005.133</subfield>
</datafield>
<datafield tag="082" ind1="x" ind2="d">
<subfield code="a">005.133</subfield>
</datafield>
<datafield tag="082" ind1="u" ind2="f">
<subfield code="a">005.133</subfield>
</datafield>
<datafield tag="082" ind1="u" ind2="k">
<subfield code="a">005.133</subfield>
</datafield>
<datafield tag="082" ind1="x" ind2="h">
<subfield code="a">005.133</subfield>
</datafield>
<datafield tag="082" ind1="u" ind2="x">
<subfield code="a">005.133</subfield>
</datafield>
<datafield tag="082" ind1="v" ind2="l">
<subfield code="a">005.133</subfield>
</datafield>
<datafield tag="082" ind1="c" ind2=" ">
<subfield code="a">005.71</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Wall, Larry</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Programming Perl</subfield>
<subfield code="c">Larry Wall, Tom Christiansen and Randal L. Schwartz ; with Stephen Potter</subfield>
</datafield>
<datafield tag="250" ind1=" " ind2=" ">
<subfield code="a">2nd ed.</subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="b">O&apos;Reilly</subfield>
<subfield code="c">1996</subfield>
<subfield code="a">Bonn</subfield>
</datafield>
<datafield tag="300" ind1=" " ind2=" ">
<subfield code="a">XXI, 646 s.</subfield>
</datafield>
<datafield tag="500" ind1=" " ind2=" ">
<subfield code="a">&quot;Covers Perl 5&quot; - Omslaget</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Christiansen, Tom</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Schwartz, Randal L.</subfield>
</datafield>
<datafield tag="942" ind1=" " ind2=" ">
<subfield code="c">BK</subfield>
</datafield>
<datafield tag="999" ind1=" " ind2=" ">
<subfield code="c">1</subfield>
<subfield code="d">1</subfield>
</datafield>
<datafield tag="952" ind1=" " ind2=" ">
<subfield code="w">2011-08-23</subfield>
<subfield code="r">2011-08-23</subfield>
<subfield code="4">0</subfield>
<subfield code="0">0</subfield>
<subfield code="6">005_430000000000000</subfield>
<subfield code="9">1</subfield>
<subfield code="b">BIB</subfield>
<subfield code="1">0</subfield>
<subfield code="o">005.43</subfield>
<subfield code="d">2011-08-23</subfield>
<subfield code="7">0</subfield>
<subfield code="2">ddc</subfield>
<subfield code="y">BK</subfield>
<subfield code="a">BIB</subfield>
</datafield>
</record>
<record>
<leader>00580nam a22001692a 4500</leader>
<controlfield tag="001">11447981x</controlfield>
<controlfield tag="008"> r eng </controlfield>
<datafield tag="020" ind1=" " ind2=" ">
<subfield code="a">9781449306595</subfield>
<subfield code="b">h.</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">DuCharme, Bob</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Learning SPARQL</subfield>
<subfield code="b">querying and updating with SPARQL 1.1</subfield>
<subfield code="c">Bob DuCharme</subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="a">Beijing</subfield>
<subfield code="b">O&apos;Reilly Media</subfield>
<subfield code="c">2011</subfield>
</datafield>
<datafield tag="300" ind1=" " ind2=" ">
<subfield code="a">xv, 235 s.</subfield>
<subfield code="b">ill.</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2=" ">
<subfield code="a">Semantic web</subfield>
</datafield>
<datafield tag="942" ind1=" " ind2=" ">
<subfield code="c">BK</subfield>
</datafield>
<datafield tag="999" ind1=" " ind2=" ">
<subfield code="c">2</subfield>
<subfield code="d">2</subfield>
</datafield>
<datafield tag="952" ind1=" " ind2=" ">
<subfield code="w">2011-10-26</subfield>
<subfield code="r">2011-10-26</subfield>
<subfield code="4">0</subfield>
<subfield code="0">0</subfield>
<subfield code="9">2</subfield>
<subfield code="b">BIB</subfield>
<subfield code="1">0</subfield>
<subfield code="d">2011-10-26</subfield>
<subfield code="7">0</subfield>
<subfield code="c">STAFF</subfield>
<subfield code="2">ddc</subfield>
<subfield code="y">BK</subfield>
<subfield code="a">BIB</subfield>
</datafield>
<datafield tag="952" ind1=" " ind2=" ">
<subfield code="w">2011-11-22</subfield>
<subfield code="r">2011-11-22</subfield>
<subfield code="4">0</subfield>
<subfield code="0">0</subfield>
<subfield code="9">8</subfield>
<subfield code="b">BIB</subfield>
<subfield code="1">0</subfield>
<subfield code="d">2011-11-22</subfield>
<subfield code="7">0</subfield>
<subfield code="c">HD</subfield>
<subfield code="2">ddc</subfield>
<subfield code="y">EBK</subfield>
<subfield code="a">BIB</subfield>
</datafield>
</record>
</collection>
@MagnusEnger
Copy link
Author

If I run this as written above, I get:

$ perl marc2rdf.pl
Can't use string ("Wall, Larry") as a HASH ref while "strict refs" in use at /usr/local/share/perl/5.14.2/RDF/aREF.pm line 184, line 2.

If I replace "my $exporter = Catmandu::Exporter::RDF->new(" with "my $exporter = Catmandu::Exporter::CSV->new(" it seems to work:

$ perl marc2rdf.pl
dc:creator,dc:title
"Wall, Larry","Programming Perl"
"DuCharme, Bob","Learning SPARQL"

@phochste
Copy link

Hi Magnus,

There are two things:

  1. You probably use a previous version of Jakob's Catmandu::RDF code which requires an '@id' field to be available ( in the most recent version you need an '_id' field). So replace in the fix:

remove_field('_id');

by

move_field('_id','@id');

  1. To see any output you need to commit all the changes in the exporter

$exporter->commit;

at the end of your perl script

Cheers
Patrick

@MagnusEnger
Copy link
Author

I'm on Catmandu::RDF 0.10, and with the edited code above it works! I think (part of) the problem was that I was removing the _id, when I should be keeping it? Here is the output from the above code:

$ perl catmandu_marc2rdf.pl 
<960778950> <http://purl.org/dc/elements/1.1/creator> "Wall, Larry" ;
    <http://purl.org/dc/elements/1.1/title> "Programming Perl" .
<11447981x> <http://purl.org/dc/elements/1.1/creator> "DuCharme, Bob" ;
    <http://purl.org/dc/elements/1.1/title> "Learning SPARQL" .

Thanks for the input!
Magnus

@phochste
Copy link

Yup :). The _id contains the subject of your triples. I'm very happy with works for you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment