Update: my blog post The lie of the API details the issues with current APIs.
Background: I'm a researcher in semantic hypermedia, at the moment comparing different APIs for accessing metadata for human and machine consumption.
Story: I am browsing a cultural website and want to retrieve the metadata of the object I'm looking at in a machine-readable format. The steps below are the actual steps that I've undertaken on different sites.
I'm looking at the object http://collection.cooperhewitt.org/objects/35460799/.
- To retrieve this in JSON, I just take copy that URL and do:
$ curl -H "Accept: application/json" http://collection.cooperhewitt.org/objects/35460799/
I'm looking at the person http://dbpedia.org/resource/Arthur_Rimbaud
- To retrieve this in JSON, I just take copy that URL and do:
$ curl -L -H "Accept: application/json" http://dbpedia.org/resource/Arthur_Rimbaud
There's even RDF if I need it (same URL): ``` $ curl -L -H "Accept: text/turtle" http://dbpedia.org/resource/Arthur_Rimbaud ```
I'm looking at the object http://www.europeana.eu/portal/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html.html?start=1&query=david+ochterlony+hookah&startPage=1&rows=24
- To retrieve JSON, I try
$ curl -H "Accept: application/json" http://www.europeana.eu/portal/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html.html
- I try to make sense of the following output:
<html><head><title>Apache Tomcat/6.0.24 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 406 - </h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The resource identified by this request is only capable of generating responses with characteristics not acceptable according to the request "accept" headers ().</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.24</h3></body></html>
- I search for the documentation.
- I end up on this page and click "API documentation".
- I end up on the Introduction page, where I see that I have to register.
- On the registration page, I enter my e-mail address.
- I receive an e-mail and click the link.
- I receive my API key.
- I click through to Working with the API and take a mental note about a field named
apikey
. - I go to Sample code. No, that's not it.
- I go to API methods and see that
record.json
(is it a method or a file) looks like what I need, so I click it. - I am informed that I need to use the URL template
http://europeana.eu/api/v2/record/[recordID].json
. This URL template has the parametersrecordID
,callback
,profile
. I only understand the second one without reading, but I don't need it (not using JSON-P). - Hoping to find the Record ID, I go back to the page I opened in the beginning. I look through the whole page and find nothing called "Record ID", but I find a field "Identifier" with string
019ADDOR0000002U00000000
. - I now feel ready to make my first API call and try
$ curl http://europeana.eu/api/v2/record/019ADDOR0000002U00000000.json?apikey=xxxxxxxxx
where xxxxxxxxx is my actual API key, using the apikey
field name I found earlier.
15. I try to make sense of the following output:
<html><head><title>Apache Tomcat/6.0.24 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - /api/v2/record/019ADDOR0000002U00000000.json</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>/api/v2/record/019ADDOR0000002U00000000.json</u></p><p><b>description</b> <u>The requested resource (/api/v2/record/019ADDOR0000002U00000000.json) is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.24</h3></body></html>
- Thinking I might have not used the API key properly, I go back to Working with the API and now see something about a
wskey
parameter. So the field is calledapikey
but the parameterwskey
. I assume this is a URL query string parameter. - I try the request again:
$ curl http://europeana.eu/api/v2/record/019ADDOR0000002U00000000.json?wskey=xxxxxxxxx
- I visually check whether the error output is the same:
<html><head><title>Apache Tomcat/6.0.24 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - /api/v2/record/019ADDOR0000002U00000000.json</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>/api/v2/record/019ADDOR0000002U00000000.json</u></p><p><b>description</b> <u>The requested resource (/api/v2/record/019ADDOR0000002U00000000.json) is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.24</h3></body></html>
- I suspect I might have gotten the identifier wrong. I go back to the original page and start looking into the source code whether I can find an identifier. I only find
019ADDOR0000002U00000000
, which I have tried already. - I go back to the Working with the API page and click the link Europeana ID next to the
recordID
field, where I read the following explanation: _Digital records delivered to Europeana are assigned a unique identifier, Europeana ID, that serves to further identify the records when using the API. Usually, this identifier is based on the original metadata that are provided for the record and internal Europeana identifiers of the provider and the dataset containing the record. For example, a Europeana ID of an object can look as follows: /09102/_GNM_1234 where 091 is the identifier of the provider, 02 is the id of the dataset and GNM_1234 is derived from the unique identifier of the record in the context of the provider. - I inspect the URL to see whether I can find such an identifier: http://www.europeana.eu/portal/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html.html?start=1&query=david+ochterlony+hookah&startPage=1&rows=24. Indeed, there is a part "92037/", but the thing that follows it does not look like that. I find this strange, but try it anyway:
$ curl http://europeana.eu/api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html?apikey=xxxxxxxxx
- I get the error message
<html><head><title>Apache Tomcat/6.0.24 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - /api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>/api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html</u></p><p><b>description</b> <u>The requested resource (/api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html) is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.24</h3></body></html>
- I try to Google for "http://europeana.eu/api/v2/record" to see if anybody else got the API working.
- I arrive at the npm package registry and find a JSON fragment that mentions the link
http://europeana.eu/api/v2/record/08501/03F4577D418DC84979C4E2EE36F99FECED4C7B11.json?wskey=abc123
. - I add my own API key to test whether I can retrieve this random object:
$ curl http://europeana.eu/api/v2/record/08501/03F4577D418DC84979C4E2EE36F99FECED4C7B11.json?wskey=xxxxxxxxx
- This works; but it's not the object that I wanted. Now let's try replacing the object identifier by
92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html
:
$ curl http://europeana.eu/api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html.json?wskey=xxxxxxxxx
This works.
- I wonder why it didn't work in step 21, only to find out that I had not added the extension
.json
. I also wonder if there is any other way of getting the object ID instead of copying from the URL.
Hi David,
Thanks for getting back on this, I appreciate it and would be glad to check out the improved version.
Documentation is indeed one part (and often neglected, but you obviously invested a lot in it). The assumption you mention is an important one. It indicates that people had an RPC-style scenario in mind: first call this, then call that.
For me, a huge collection is all about the resources and how they interlink, and not so much about a sequence of operations performed on those resources. And this, unfortunately, is about API design as well and I know that is much more difficult to change than documentation. I'm afraid that the major issue here is that users have to read the documentation before they can get started. With the Cooper-Hewitt and DBpedia APIs above, I didn't have to read documentation: the identifier of each object is the URL, and this URL allows me to retrieve the object both manually and programmatically. I have a hard time understanding the technical necessity to make APIs more complex than that. (Of course, there are other necessities.)
But even if there are non-technical reasons to have a separate API, they should correspond to each other. The Object ID problem illustrates this: I had to manually find a part of the path in my URL and then paste this into another URL. It would be a good idea for the HTML version to list this ID; and a good idea for the JSON version to link to the HTML version by its URL. Another problem is that I cannot share the JSON version: it is impossible for me to link to it, as the URL includes my private key. Even worse, I cannot share the JSON body as-is, because it also contains my API key.
Furthermore, I'm also afraid that this API key makes it impossible to develop AJAX applications that use the Europeana API. Suppose I have a museum website, how can my pages retrieve objects from Europeana in a dynamic way? It's impossible to add this to the client-side code, because this would mean disclosing my API key. This means it has to happen on the server side, but that was possible before anyway. I'm not saying I assume every API must be open; it's just that having one representation open (HTML) and another closed (JSON) is strange. The information is not shielded off; the representation is: we could just equally scrape the HTML pages and extract the JSON information out of them, as they all follow a structured template. In that regard, it doesn't make sense to provide difference affordances to human and machine clients. Limiting access per IP address is far more effective to combat misuse, given that API access is free. Plus, a keyless API would allow use in Web applications (where IP addresses are distributed across clients, so blocking is not an issue.)
Finally, perhaps even more important than the documentation is improving the error messages. They are not helpful at all; sometimes HTML and sometimes JSON. Couldn't they just include links to example calls I can do with my API key? Or if I got the key wrong, indicate how I need to add it? Self-descriptiveness is important here.
Fortunately, APIs are living things, and I'm sure that your improvement work will change Europeana for the better!
Cheers,
Ruben