GREL recipes for working with bibliographic and archival data in OpenRefine. This Gist is mostly for my own use but is shared in case other catalogers/library metadataists are puzzling over similar problems. More general recipes are on OpenRefine's Recipes page.
Example MODS XML query: https://api.lib.harvard.edu/v2/items?q=goats&repository=Houghton%20OR%20Theatre
In all examples, value
is a column containing individual MODS records (the mods:mods
XML element) as rows.
This expression, followed by Split multi-valued cells
by separator [five backticks] `````,
creates the column of MODS records from the LibraryCloud API results:
forEach(value.parseXml().select("mods|mods"), v, v).join('`````')
Any separator that does not appear in the MODS XML data can be used as the separator value in .join('[separator value]')
.
value.parseXml().select("mods|recordInfo")[0].select("mods|recordIdentifier")[0].xmlText()
This also works:
value.parseXml().select("mods|recordInfo > mods|recordIdentifier")[0].xmlText()
First title from mods:titleInfo:
value.parseXml().select("mods|titleInfo")[0].xmlText()
First uniform title from mods:titleInfo:
value.parseXml().select("mods|titleInfo[type=uniform]")[0].xmlText()
A list of alternative titles separated with line breaks:
forEach(value.parseXml().select("mods|titleInfo[type=alternative]"), v, v.xmlText()).join("\n")
Remove all subfield delimiters and tags and any subfield 5
forEach(value.split('$$'), v, if(v.startsWith('5'), null, v.slice(1,length(v)))).join(' ')
Keep only the value of subfield 5
Change the subfield letter/number to select another subfield. Add .chomp('.')
, etc., to v.slice()
to remove end of subfield punctuation.
forEach(value.split('$$'), v, if(v.startsWith('5'),v.slice(1,length(v)),null)).join(' ')
Fill down value within a record
For Cognos reports grouped by bib number in Excel with merged rows. Filter by blanks on first (bib number) column. Fill down other columns by using the following GREL, entering the column name in place of "COLUMN_NAME". Fill down the bib number last to keep the OpenRefine record structure.
row.record.cells['COLUMN_NAME'].value[0]