Making the W3CLove API RESTful
W3CLove is a project started by a student at Mendicant University. They asked for some feedback about the API, and while it’s perfectly fine, it’s not exactly RESTful. They asked for some feedback, so I’m going to cover the process of transforming it from its current RPC style into an actual RESTful one here.
Step 1: Evaluate processes
Let’s look at what the API actually does. What workflows do we need to support with this API?
Let’s check out the W3CLove API page. There are two basic functions: Evaluate an entire site, or evaluate a page.
With an API this simple, you might wonder, how can it possibly improve? We have two API calls, they accept one parameter. What’s the matter with this design?
The problem, as is with most software, is hidden coupling.
First, we have coupling of our URLs to our implementations. Note that if we change our URLs, the API breaks. That sucks. We’ve coupled them quite tightly to our implementation.
Secondly, we use out of band information when returning our results. This couples our documentation of the responses to our implementation code as well. If we change some fields in our responses, the URL doesn’t change, and the media type doesn’t change. Clients have no idea things are wrong! Then stuff breaks. Sad… we need to include information about how our responses are formed in the response.
We’ll deal with both of these kinds of coupling in turn.
Step 2: Create state machine
We need to create a state machine for our two processes. Let’s talk about decoupling these URLs. How do we get to our two processes without knowing what the URLs are? By linking, of course!
The basic idea is this: we only want our entry point URL to be published. We’ll make sure it’s always available, but after that, clients discover the URLs they need to do their processing. Let’s ignore how they decide which URL is which, and focus on what URLs we need.
Well, we’ll need two resources, one for each kind of computation: our sitemap API and our webpage API. In order to process the computation, we need to pass in a parameter, though. Forms are a method that we can use to parameterize GET URLs, so we’ll need two things from our API: we need to request a form to tell us how to evaluate our computation, and then we expect to process that form and get some sort of useful information back. A workflow version of this might look like this:
But really, we’re forgetting a resource here. Displaying a sitemap response is different than viewing a webpage response, so we really need something like this:
You can see that we’re doing nearly the same thing in each step. Even easier for our design!
You’ll also notice that I have a link back to the root in each of these displays. After performing one computation, you’ll probably want to do another, so I’ve added a link back for convenience. It’s not strictly necessary, but I prefer to wrap back around to the start. The flow is nicer; you can see what the users of your API will be doing.
Now that our workflows are settled, let’s evaluate what media types we need.
Step 3: Evaluate Media Types
Media type design is almost as much art as it is science, and so I’m going to be a little bit terse here. It’s really important to get media types right, as once they’re out there, they need to be forward-compatible. So think about it hard!
Since JSON is all the rage with the kids these days, let’s create a media type based on JSON. Why can’t we use stock JSON? Well, we’ve already established that we need a form to template our URL, and we need to have a link back to our origin. JSON does not include semantics for links and forms! That doesn’t mean the structure of JSON is bad, but we need to add those semantics on top of it, and that means minting a new type.
Making a new media type has a few different steps, but it’s kinda outside of the scope of this post. Normally, we’d put up some documentation at a stable URL, but let’s do this quick and dirty for now. We just want to talk at a high level here.
Step 4: Create Media Types
These media types are going to be 99% compatible with the current types that are being returned, but with one change. I’ll talk about the change afterwards. Let’s discuss this new type:
The site validation+json media type
We’re going to call this type ‘validation+json.’ For now, since we haven’t registered it with anyone, we should give it the name vnd.w3clove.validation+json. This is a vendor specific media type, since we’re the only one that uses it.
The vnd.w3clove.validation+json type will conform to JSON structurally, but have these additional semantics:
Elements
A response MAY contain created_at, scraped_at, scraping_success, updated_at, url, web_pages_count, validation_errors_count, validation_warnings_count, and pending_count elements All of these elements contain exactly what you’d think. In a real type declaration, I’d explain them further and individually.
It MAY contain a web_pages key, which will hold an array of responses. These responses will have these keys: created_at, updated_at, url, validated_at, validation_errors_count, validation_warnings_count, w3c_validation_success. Same deal here, in real documentation, I’d explain these fully.
A response MAY include a links element, which MUST have an array of responses. These MUST have of these elements: href and rel.
A response MAY include a forms element, which is an array of objects. They MUST have these elements: href, rel, and data elements. data will be an array of objects that MUST have two keys, name, and value.
Rels
Rels are included both in our form elements, as well is in our links element. These names provide semantic meaning in any of these given places. Here’s the ones for validation+json:
sitemap-form: Following a link with this rel will lead you to a resource with a form for generating a Sitemap API request. sitemap: Processing a form with this rel will lead you to a resource that gives you validation information about a sitemap. website-form: Following a link with this rel will lead you to a response with a form for generating a Website API request. website: Processing a form with this rel will lead you to a resource that gives you validation information about a website. root: A link with this rel always leads back to the site root. And with that, we’re done with the media type! The big changes from the existing type are:
Adding forms and links portions to the responses. This is the hypermedia we were missing! Adding link relations. We need these to know which links to follow. Even single-site responses are returned in a web-sites array of one element. This simplifies our need from two different responses to one. Why define two types when you can make it all work in one type? Step 5: Implementation!
Let’s pretend the site exposes this API at http://w3clove.com/api/. Here’s a sample CURL session:
{"rel":"website-form", "href":"http://w3clove.com/api/..."},
{"rel":"sitemap-form", "href":"http://w3clove.com/api/..."}
] } ```
I haven’t even filled in the URLs. They shouldn’t matter. So I’m not gonna tell you what they are. :p
We parse this with JSON, and follow the (in Ruby notation) link:
``` $ curl -H “Accept: application/vnd.w3clove.validation+json” response[“links”].find{|l| l[“rel”] == “sitemap-form”}[“href”] { “forms”:[
{"href":"http://w3clove.com/api/...",
"rel":"sitemap",
"data":[
"name":"check",
"value":""]}
] } ```
That’s not valid, the point is that I’m using the Ruby notation to emphasize that we follow the link, not calculate the URL.
Anyway, Now we want to make this request…
$ curl -H "Accept: application/vnd.w3clove.validation+json" response["forms"].find{|f| f["rel"] == "sitemap"}["href"] + "?" + response["forms"].find{|f| f["rel"] == "sitemap"}["data"]["name"] + "=" + "http://www.zeldman.com"
Okay, so that calculation was awkward. You’d do it in code. Anyway, we get a response back:
``` {
"created_at": "2012-01-30T01:17:04Z",
"scraped_at": "2012-01-30T01:17:10Z",
"scraping_success": true,
"updated_at": "2012-01-30T01:17:10Z",
"url": "http://www.zeldman.com",
"web_pages_count": 57,
"validation_errors_count": 2951,
"validation_warnings_count": 8,
"pending_count": 0,
"web_pages": [{
"created_at": "2012-01-30T01:17:09Z",
"updated_at": "2012-01-30T01:17:23Z",
"url": "http://www.zeldman.com/",
"validated_at": "2012-01-30T01:17:23Z",
"validation_errors_count": 0,
"validation_warnings_count": 0,
"w3c_validation_success": false
}, {
"created_at": "2012-01-30T01:17:10Z",
"updated_at": "2012-01-30T01:21:14Z",
"url": "http://www.zeldman.com/2011/12/21/the-big-web-show-no-61-khoi-vinh-of-mixel-and-nytimes-com/",
"validated_at": "2012-01-30T01:21:14Z",
"validation_errors_count": 7,
"validation_warnings_count": 0,
"w3c_validation_success": true
}, {
"created_at": "2012-01-30T01:17:10Z",
"updated_at": "2012-01-30T01:21:09Z",
"url": "http://www.zeldman.com/2011/12/22/migrate-if-you-like-but-touristeye-is-not-a-gowalla-partner/",
"validated_at": "2012-01-30T01:21:09Z",
"validation_errors_count": 8,
"validation_warnings_count": 1,
"w3c_validation_success": true
}, {
"created_at": "2012-01-30T01:17:10Z",
"updated_at": "2012-01-30T01:21:08Z",
"url": "http://www.zeldman.com/2011/12/23/hitler-reacts-to-sopa/",
"validated_at": "2012-01-30T01:21:08Z",
"validation_errors_count": 6,
"validation_warnings_count": 0,
"w3c_validation_success": true
}],
"links":[
{"rel":"root", "href":"http://w3clove.com/api"}
]
} ```
Bam! We’ve got all of our data. You can imagine how this would work for the other process, too.
Improvements
We can do a few things that might help performance. First, some client-side caching would help a lot, especially on our root page: it probably doesn’t change very often.
Secondly, we can just embed the forms into our root as well, if we’d like: Since they probably won’t change often either, that might make sense, and then we wouldn’t need to make as many requests. It all depends!