A previous post How to Add Museum IDs to Wikidata explained how to use SPARQL to find missing data on Wikidata (Getty Museum IDs), how to create such values (from museum webpage URLs) and how to format them properly for QuickStatements.
Here I explain how to use Google sheets to manage a more advanced task. The sheet AAT-Wikidata matches about 3k AAT concepts to Wikipedia, WordNet30 and BabelNet (it restored an old mapping to Wordnet, retrieved it from BabelNet, mapped to Wikipedia).
- For each row, it uses the following Google sheet formula (column C) to query the Wikipedia API and get the corresponding Wikidata ID (wikibase_item); split on two lines for readability:
=ImportXml(concat("https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&
ppprop=wikibase_item&redirects=1&format=xml&titles=",G1),"//@wikibase_item")
- This formula asks for results in XML format, and the part
//@wikibase_item
is a XPath that fetches the WD ID from the resulting XML. - Making 3k API calls is slow, so Google sheet initially shows "Loading…" for all rows, and gradually "materializes" the WD IDs (Qnnnn) as they come in. I have periodically sorted the column and used "Edit> Paste special> Values only" for the "materialized" IDs in order to fix them and not cause re-fetching next time when I open the google sheet.
- Columns C,D,E are specially formatted to produce the required QuickStatements tab-delimited format. E.g. for row 62 AAT "patrol_wagons" corresponding to WP "Police_van":
Q1023646 P1014 "300212831"
The benefit of Google sheets is that they allow easy addition of columns and convenient facilities for manual tasks:
- Collaborative editing by several people at once.
- Column A for tracking which rows are checked, which are already inserted to WD, etc
- Using filters to find rows of interest. E.g. check=1 means rows that are manually checked and ready for insertion to QuickStatements. After insertion, I change it to check=2 to mark it as already inserted.
- Column B for tracking already existing WD IDs
- Conditional formatting to colour existing WD IDs (column B) that differ from my idea what is the matching WD ID (column C) and therefore must be checked.