17 January 2017 I get this nice reminder by Wikidata user Pasleim: Unused properties This is a kind reminder that the following properties were created more than six months ago: Artsy gene (P2411), J. Paul Getty Museum object id (P2582), dataset distribution (P2702). As of today, these properties are used on less than five items. As the proposer of these properties you probably want to change the unfortunate situation by adding a few statements to items.
I went ahead and added a few Getty Museum IDs by hand. Then I looked more closely at the available data and it hit me: let's do some SPARQL queries to generate the missing data.
First, there are 471 objects in collection J. Paul Getty Museum (JPGM):
SELECT (count(*) as ?c) {
?item wdt:P195 wd:Q731126
}
Don't be scared by the P and Q numbers: the Wikidata query service has two nice features to explain them:
- mouse-over over the P and Q identifiers shows what they mean
- the "query explanation" box on the right displays "collection J. Paul Getty Museum"
Note: Consider that JPGM has some 800k objects: 471 in Wikidata is very few, so a Data Donation from JPGM would be very much appreciated. See WikiProject_Cultural_heritage and Partnerships_and_data_imports.
Most of these objects have a property "described at URL" that include the object ID I seek, eg: http://www.getty.edu/art/collection/objects/103928/adelaide-labille-guiard-head-of-a-young-woman-french-1779/.
I can get the ID with the following query
SELECT ?item ?id {
?item wdt:P195 wd:Q731126; wdt:P973 ?url.
filter not exists {?item wdt:P2582 ?objId}
bind(strbefore(strafter(str(?url),"http://www.getty.edu/art/collection/objects/"),"/") as ?id)
}
This can be decoded as follows:
- find
?item
in collection J. Paul Getty Museum - fetch its "described at"
?url
strafter()
discards the common website prefixhttp://www.getty.edu/art/collection/objects/
strbefore()
discards the human-readable page suffix eg/adelaide-labille-guiard-head-of-a-young-woman-french-1779/
- which leaves only the
?id
I need103928
I further noticed that:
- some artwork URLs are like http://www.getty.edu/art/gettyguide/artObjectDetails?artobj=310277
- some objects (eg https://www.wikidata.org/wiki/Q18627387 Seascape with Sailing Ship and Tugboat) have been both at the Getty and Musee d'Orsay so they have two "described at URL" (eg the second one is http://www.musee-orsay.fr/en/collections/catalogue-des-oeuvres/notice.html?nnumid=7335)
I'll use the Quick Statements tool by Magnus "the magnificent" Manske. I need to create a simple text file in format item <tab> prop <tab> value
. Since P2582 is a string but the values are purely numeric, I need to surround them with quotes to indicate a string, eg:
Q20181112 P2582 "103928"
So I modify the query as follows:
SELECT ?itm ?prop ?idStr {
?item wdt:P195 wd:Q731126; wdt:P973 ?url.
filter not exists {?item wdt:P2582 ?objId}
bind(strafter(str(?item),"http://www.wikidata.org/entity/") as ?itm)
bind("P2582" as ?prop)
bind(strbefore(strafter(str(?url),"http://www.getty.edu/art/collection/objects/"),"/") as ?id1)
bind(strafter(str(?url),"http://www.getty.edu/art/gettyguide/artObjectDetails?artobj=") as ?id2)
bind(if(?id1="",?id2,?id1) as ?id)
filter(?id != "")
bind(concat('"',?id,'"') as ?idStr)
}
- remove the common prefix
http://www.wikidata.org/entity/
of?item
- return the constant string
P2582
as property name - match/remove one form of Getty URLs as
?id1
, and the other form as?id2
- "coalesce" either
?id1
or?id2
(the first one that is non-empty) into?id
. (There is a SPARQL functioncoalesce()
but it checks whether the var is bound, not if it's empty) - remove rows where
?id
is empty (thes rows correspond to URLs that match neither Getty pattern) - surround the numeric
?id
in quotes to indicate it's a string?idStr
Download the result as Simple TSV (don't want each data cell surrounded by extra quotes).
All Wikidata updates are recorded with the user who made them, no matter what tool he used. So you need to register with WiDaR if you haven't yet. Finally, I go to Quick Statements, paste all but the header line and press Do It.
And voila! I've added 438 "J. Paul Getty Museum object IDs" to Wikidata.
15:10, 21 January 2017 (diff | hist) . . (+344) . . Landscape with Classical Ruins and Figures (Q20178204)
(Created claim: J. Paul Getty Museum object id (P2582): 559, #quickstatements) (current) (Tag: Widar [1.4])
15:10, 21 January 2017 (diff | hist) . . (+345) . . Still Life with Game, Vegetables, Fruit and a Cockatoo (Q20178199)
(Created claim: J. Paul Getty Museum object id (P2582): 535, #quickstatements) (current) (Tag: Widar [1.4])
For a more detailed explanation of a similar technique, see Editing_Data_in_Spreadsheet_Mode