Skip to content

Instantly share code, notes, and snippets.

@vmdowney
Last active February 3, 2020 22:42
Show Gist options
  • Save vmdowney/063c656ec3a53f28c7f8f045f8af050d to your computer and use it in GitHub Desktop.
Save vmdowney/063c656ec3a53f28c7f8f045f8af050d to your computer and use it in GitHub Desktop.
GREL for Library Data

GREL for Library Data

GREL recipes for working with bibliographic and archival data in OpenRefine. This Gist is mostly for my own use but is shared in case other catalogers/library metadataists are puzzling over similar problems. More general recipes are on OpenRefine's Recipes page.

Working with Harvard LibraryCloud Data

Example MODS XML query: https://api.lib.harvard.edu/v2/items?q=goats&repository=Houghton%20OR%20Theatre

Parsing MODS XML

In all examples, value is a column containing individual MODS records (the mods:mods XML element) as rows.

This expression, followed by Split multi-valued cells by separator [five backticks] `````, creates the column of MODS records from the LibraryCloud API results:

forEach(value.parseXml().select("mods|mods"), v, v).join('`````')

Any separator that does not appear in the MODS XML data can be used as the separator value in .join('[separator value]').

mods:recordIdentifier

value.parseXml().select("mods|recordInfo")[0].select("mods|recordIdentifier")[0].xmlText()

This also works:

value.parseXml().select("mods|recordInfo > mods|recordIdentifier")[0].xmlText()

mods:titleInfo

First title from mods:titleInfo:

value.parseXml().select("mods|titleInfo")[0].xmlText()

First uniform title from mods:titleInfo:

value.parseXml().select("mods|titleInfo[type=uniform]")[0].xmlText()

A list of alternative titles separated with line breaks:

forEach(value.parseXml().select("mods|titleInfo[type=alternative]"), v, v.xmlText()).join("\n")

Reformatting MARC Data from Cognos Reports

Remove all subfield delimiters and tags and any subfield 5

forEach(value.split('$$'), v, if(v.startsWith('5'), null, v.slice(1,length(v)))).join(' ')

Keep only the value of subfield 5

Change the subfield letter/number to select another subfield. Add .chomp('.'), etc., to v.slice() to remove end of subfield punctuation.

forEach(value.split('$$'), v, if(v.startsWith('5'),v.slice(1,length(v)),null)).join(' ')

Fill down value within a record

For Cognos reports grouped by bib number in Excel with merged rows. Filter by blanks on first (bib number) column. Fill down other columns by using the following GREL, entering the column name in place of "COLUMN_NAME". Fill down the bib number last to keep the OpenRefine record structure.

row.record.cells['COLUMN_NAME'].value[0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment