Skip to content

Instantly share code, notes, and snippets.

@mstade
Last active December 30, 2015 17:59
Show Gist options
  • Save mstade/7864467 to your computer and use it in GitHub Desktop.
Save mstade/7864467 to your computer and use it in GitHub Desktop.
Programmable Hypermedia Client Blues

Programmable Hypermedia Client Blues (Part I)

In Restful Web APIs it is argued that one of the major problems with APIs (arguably regardless of whether they are RESTful) is that they don't convey enough semantics so a client unfamiliar with implementation details (media type, URL structures, HTTP methods etc.) could still make use of it. On the "human web" this isn't really a problem, as the book rightly points out, because humans are much better at making decisions despite considerable semantic gaps. For example, if a site contains a link with the text:

Click here to buy our most popular product!

It's easy for us to understand that we can click on it to purchase an item; a computer however would just see some additional markup and realize it's a link, but not where it points to, why or what it is:

<a href='/buy/12345'>Click here to buy our most popular product!</a>`

Maybe the client can be coded to understand this particular language, but unless it had fairly sophisticated language recognition it'd break if the language changes.

The solution, the book says, is that you could change the markup to include relevant semantics that would give enough hints to the client so it can make informed decisions. For instance, adding the IANA link relation payment might be enough to guide the client to a resource that describes how to purchase the relevant item. Here's how that might look:

<a href='/buy/most-popular' rel='payment'>Click here to buy our most popular product!</a>

Now, if the client is programmed to understand the payment link relation (as well as the <a> tag of course,) it would understand what to do regardless of changes to the text or even URL. Of course, this also assumes that this is the only payment link on the site, which may or may not be true. Let's for the sake of argument say that right now, that's how it is.

This is all well and good, and makes sense from a theoretical point of view. However once you try to implement this in an actual client, it fairly quickly becomes a nuisance. Here's how you might do it using jQuery to interact with the markup:

$.get('http://www.example.com/', function(API) {
  var paymentURL = $('a[rel=payment]', API).attr('href')
  
  $.get(paymentURL, function(paymentInstructions) {
    $('body').replaceWith('body', paymentInstructions)
  })
})

Not too bad, but what does the paymentInstructions page looks like? How might the client interact with it? Let's have the client specify not just the media type it wants, but also a profile that explains the process of paying for an item. The client will make sure to tell the server about its preferences. Let's also say that unlike before, there are more payment links on the site, so the client can't assume that any old link will be the correct one. So the client will make sure it also looks for a more specific link, the semantics of which is described in yet another profile. Finally, let's say these profiles are kind of industry standard, so it's not specific to our domain. To follow along with the code, Here are the representations served for the different URLs:

  • http://www.example.com/
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <link rel="profile" href="http://www.fake-standards.org/shop"/>
  </head>
  <body>
    <p>We sell the best Danish milk in the world! It's amazeballs!</p>

    <a id="most-popular" href='/buy/most-popular' rel='payment'>Click here to buy our most popular product!</a>
    <a id="least-popular" href='/buy/least-popular' rel='payment'>Click here to buy our least popular product!</a>
  </body>
</html>
  • http://www.example.com/buy/most-popular
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <link rel="profile" href="http://www.fake-standards.org/shop/payment-instructions"/>
    <style>
      label { display: block; }
    </style>
  </head>
  <body>
    <p>To purchase <a href="http://www.youtube.com/watch?v=s-mOy8VUEBk">1000 liters of milk</a>, please submit the following form.</p>

    <form name="checkout" action="/buy/most-popular" method="post">
      <label>Credit card number:<input name="credit-card-number" type="text"></label>
      <label>Expiration date:<input name="credit-card-expiration" type="text"></label>
      <label>Security code:<input name="credit-card-security-code" type="text"></label>
      <button type="submit">Go go go!</button>
    </form>
  </body>
</html>

It might look something like this:

$.ajax('http://www.example.com/', { headers: { Accept: 'text/html;profile="http://www.fake-standards.org/shop"' } })
  .done(function(API) {
    var paymentURL = $('a#most-popular[rel=payment]', API).attr('href')
    
    $.ajax(paymentURL, { type: 'POST', headers: { Accept: 'text/html;profile="http://www.fake-standards.org/shop/payment-instructions"' } })
     .done(function(paymentInstructions) {
        checkoutForm = $('form[name=checkout]', paymentInstructions)
        
        $('[name=credit-card-number]', checkoutForm)
          .val('1234-5678-9012-3456')
        $('[name=credit-card-expiration]', checkoutForm)
          .val('01/16')
        $('[name=credit-card-security-code]', checkoutForm)
          .val('987')
          
        $.post(checkoutForm.attr('action'), checkoutForm.serializeArray(), function(done) {
          $('body').replaceWith('body', done)
        })
      })
  })
})

There may well be better ways of doing this with jQuery – surely there are sophisticated plugins if nothing else – but it's probably not a far cry from what an average developer like myself might come up with. And while we're all nice and hypermedia driven, there are a few problems I see with this approach:

  • It's too tightly coupled. While we may not be directly concerned with URLs, we're assuming that we'll be working with anchor and form tags. We did specifically ask for HTML, but what if the profile doesn't dictate the tags to use for the markup, and the server decides to change the anchor tag to a form? Or change the form's method from the assumed POST to PUT? (I know PUT isn't valid HTML, but for the sake of argument let's ignore this.) Sure, the profile could be more stringent, but should it really have to? What if it's a profile that's designed not for HTML but just generically defines relations and semantic descriptors and states that the mechanics should be dictated by the media type? For example, it might say that checkout describes an unsafe transition, but doesn't dictate whether this means POST, PUT or perhaps not even an HTTP method at all but some other protocol.

  • It's verbose. While it may be useful to distinguish between a semantic descriptor such as .credit-card-number and a link relation, it doesn't have to get as verbose as [rel=payment]. Arguably, the application semantics gets muddied with all sorts of uninteresting implementation details.

  • It's asynchronous. This code is fairly dumbed down so it can more easily be grokked, but that's also part of the point. Asynchronous programming is hard, particularly when the language doesn't provide convenient syntax for pausing execution until an asynchronous call yields, but rather just keeps on keeping on. Callback hell is a real thing, and people come up with all sorts of workarounds to the problem.

  • It assumes that resources aren't embedded. Performance wise this is a very important point. The client assumes that additional requests are needed, which means even if the server embeds the checkout form in the first representation the client will still make the request to get that resource which is unnecessary work for everyone involved. Jon Moore in his most excellent presentation on Hypermedia APIs demonstrated that a client smart enough to understand whether the server already included the necessary resources in its responses, or whether additional requests are necessary. He then recently followed this up with an equally epic presentation providing some more detailed insight on html5 microdata.

Finding a middle ground

I chose to base this article on a jQuery sample for a reason: it revolutionized how we work with the HTML DOM. Not just in a browser client context, but in a server context as well with the advent of JavaScript-on-the-server and portable libraries. Perhaps we can do the same with programmable clients that make use of (more or less RESTful) APIs? Let's set up some requirements on our client, to fix some of the problems previously mentioned.

The client should:

  • Be hypermedia driven
  • Handle embedded resources
  • Be as synchronous as possible
  • Be concise
  • Be somewhat configurable

The client should not:

  • Be protocol specific
  • Be media type specific
  • Be profile specific
  • Be domain specific
  • Render the response body

Like a cheesy cooking show, I've mocked up some syntax to show how such a client might be used, rewriting the example above. The language is JavaScript, but the API is completely fake and an implementation doesn't exist. The point here is to show what the syntax of such a client might look like. So without further ado:

api('http://www.example.com/')
  .profiles(
    'http://www.fake-standards.org/shop'
  , 'http://www.fake-standards.org/shop/payment-instructions'
  )
  .to('most-popular')
  .to('payment')
  .set(
    { 'credit-card-number'        : '1234-5678-9012-3456'
    , 'credit-card-expiration'    : '01/16'
    , 'credit-card-security-code' : '987'
    }
  )
  .to('checkout')
  .on('200', function(response) {
    $('body').replaceWith($('body', response.body))
  })
  .go()

Before we explore how this adds up to our feature list, let's go through the example and see what it's actually doing. Let's start with the very first line:

api('http://www.example.com/')

The idea here is that we always start with an entry point, which is a URL. It's hardcoded in this mockup, but let's ignore that. Let's also ignore where we got it from in the first place and just assume that it's a nice, stable URL where one can go to find more information about an API.

The entry point URL is fed to the api function. Because this is just a mockup I didn't want to come up with a name that sounds like a product, since there's no implementation available (yet.) This function returns a request object, a monad which will allow you to describe what you want to do. Every method of this object will return such an object, a clear nod to the fluid style of jQuery.

Moving on, we see the second line defining a profile:

  .profiles
    ( 'http://www.fake-standards.org/shop'
    , 'http://www.fake-standards.org/shop/payment-instructions'
    )

My thinking here is that the profile serves two purposes. First, it should inform servers of what profile we'd like, by sending and Accept header that includes the profile parameter for any media type that supports it. Second, if responses don't include hypermedia affordances or profiling information, this property should inform our client how to deal with those anyway. If we didn't do this, our programmable hypermedia client wouldn't be able to handle representations that just describe themselves as application/json.

This function supports multiple profiles, so they can be combined and thus make the client smarter. However, it should be noted that I think this should be a guide to the client in case the server neglects to include this information. If the responses already include profiles, that information should probably take precedence over these values. I'll expand more on this later.

At this point, we have an API object that knows where to go to find a representation of an API, and how to profile it. However, we've not actually done anything yet. On the next few lines, the meat of our process is defined:

  .to('most-popular')
  .to('payment')
  .set(
    { 'credit-card-number'        : '1234-5678-9012-3456'
    , 'credit-card-expiration'    : '01/16'
    , 'credit-card-security-code' : '987'
    }
  )
  .to('checkout')

A few things are going on here. First, we find the most popular product, as defined by our profile:

  .to('most-popular')

Then, we transition to the payment state:

  .to('payment')

How we transition to this state – whether it's through activating a link control or something else – is defined by the media type and/or profiles. It could be that we're already there! As far as the client's concerned that's a low level concern. As consumers of the API it doesn't really matter, all we care about is to transition to that state. To be a bit more concrete, let's look at our HTML again. In the above example, it looked something like this:

<a id="most-popular" href='/buy/most-popular' rel='payment'>Click here to buy our most popular product!</a>

This means that in order to transition to the payment state, we have to GET the representation of at the other end of that URL. However, what if the original representation didn't actually link to a different state, but rather a different part of our current state? It could look something like this:

<a id="most-popular" href='#checkout' rel='payment'>Click here to buy our most popular product!</a>

<!-- Let's say there's a bunch of stuff in between here. -->

<form name="checkout" action="/buy/most-popular" method="post">
  <label>Credit card number:<input name="credit-card-number" type="text"></label>
  <label>Expiration date:<input name="credit-card-expiration" type="text"></label>
  <label>Security code:<input name="credit-card-security-code" type="text"></label>
  <button type="submit">Go go go!</button>
</form>

In this case, the payment transition actually just means "look over there in another part of this representation." Let's do another example, where we're not using an anchor tag at all:

<div id="most-popular">
  <p>Our most popular product is 1000 liters of milk. To purchase, please submit this form.</p>

  <form name="checkout" class="payment" action="/buy/most-popular" method="post">
    <label>Credit card number:<input name="credit-card-number" type="text"></label>
    <label>Expiration date:<input name="credit-card-expiration" type="text"></label>
    <label>Security code:<input name="credit-card-security-code" type="text"></label>
    <button type="submit">Go go go!</button>
  </form>
</div>

In this case, there's no link at all! Instead, the form has a class attribute to dictate that this is how you pay. To cover all these cases, the profile would have to clearly state that payment may refer to a link relation or a part of the representation itself. If this is described, our client should be able to understand what do do.

Next, we parameterize the request, by setting values for the properties our media type or profile mentions:

  .set(
    { 'credit-card-number'        : '1234-5678-9012-3456'
    , 'credit-card-expiration'    : '01/16'
    , 'credit-card-security-code' : '987'
    }
  )

And finally, we tell the client to transition to the checkout state; in this case, it's the name of the form letting the client know that it should go ahead and submit it according to the rules specified in the media type and/or the profile.

This far, we've defined what the client wants to do and how it intends to go about it. Let's now explore what should happen when it's done it:

  .on('200', function(response) {
    $('body').replaceWith($('body', response.body))
  })

This is fairly straight forward. The on method lets you define what response you're interested in, and when such a response happens the provided function should be called. The body of the callback is not particularly important I think, so I'll leave that be.

While on in this case refers to the 200 OK HTTP response code, I think the client should allow this string to be fairly freeform, and depend on whatever the underlying protocol used supports. It could in fact even be a query; for instance, let's say we want to do the same thing for any response in the 2xx range, it could look something like this:

  .on('2xx', function(response) {
    $('body').replaceWith($('body', response.body))
  })

This way, the client would do the same thing regardless of whether the response is a 200, 201 or other code in that bracket. Likewise, maybe it'd like to handle errors the exact same way as successful requests; it could look like this:

  .on('2xx, 4xx-5xx', function(response) {
    $('body').replaceWith($('body', response.body))
  })

Obviously syntax is always up for debate, but I think the mockups here show adequately that there's ample room for flexibility. As well, these are HTTP status codes, but the events could just as well be dictated by another protocol. For instance, for CoAP it might look something like this:

  .on('2.01', function(response) {
    // Handle 'created' response
  })

Finally the last line is effectively a terminator, a way to tell our library we're ready to go ahead and actually do some work now:

  .go()

When called, this method kicks off all processes necessary to reach the end result which is to eventually call the event handlers; if no handlers are defined it just goes ahead without having anything to call when done.

API Reference

This is a mockup, but fairly concrete in both scope and utility. Let's recap the functions discussed, and as well define their syntax and function more formally.

api (function; safe, idempotent)

First off, of course, is the library's entry point. It only takes one argument, which is the entry point of an API. This entry point can be any URL; it doesn't have to be the root of a domain or anything predefined. In fact, the entry point may actually be a URL that was retreived in the middle of some other flow. Point is, it doesn't really matter.

This function should be safe, because it should never do anything beyond getting the API representation at the other end of the URL.

Interface
function api(URL) { return request }

The URL should be absolute, and refer to a resource that can be represented in some way.

Request (type)

The request object is a monad that describes our process,. As such, it's actually not necessarily just one request, but depending on the responses from the server may be multiple. Thus, 'request' might not actually be an appropriate term to use. Still, that's what I'll call it for now.

Interface
Request.profiles(url, ...urls) { return request }
Request.to(name) { return request }
Request.set(object) { return request }
Request.set(key, value) { return request }
Request.on(event, callback) { return request }
Request.go() { return request }

All functions belonging to the request object should return a request object; whether it's the same instance or a new one I don't know, but an argument for creating a new one would be that it's easier to ensure state isn't altered elsewhere in the program. Regardless, I consider that an implementation detail rather than a specification detail.

profiles (function; safe, idempotent)

Profiles are a way to define additional application semantics, on top of a media type. Ideally, one chooses a fairly generic media type, so that many clients are able to consume the API. But whatever problems the API might be trying to solve may not be very generic, so there will be semantics within the representations that describe what things are and where to go next. These should be described in a profile.

This function is a way to declare profiles that are assumed in the flow. This means that if the server neglects to add profile information – i.e. it doesn't include a Link header with rel="profile", the media type doesn't include such information, or both – the client can still make sense of the response by assuming the profiles provided to this function should apply.

Interface
function profiles(url, ...urls) { return request }

This function should be called with one or more URLs, one for each profile that should apply. The order is important and if there's a conflict between profiles, then the one declared first should win.

to (function; potentially unsafe, potentially non-idempotent)

The to function is special, because it effectively changes the way it operates based on the underlying protocol, media type and profile (if available.) It's used to transition to a given state. It does this either by finding that state in the representation the client has retrieved already, or if that state refers to a hypermedia control it will activate the control and consider the response as the new context. This function is arguably the most complex of all in this mockup, and is probably best described using an example.

In my previous example, to is used three times:

  .to('most-popular')
  .to('payment')
  .set(
    { 'credit-card-number'        : '1234-5678-9012-3456'
    , 'credit-card-expiration'    : '01/16'
    , 'credit-card-security-code' : '987'
    }
  )
  .to('checkout')

For the first transition – .to('most-popular') – the media type or profile has dictated that the context is now changed to look at the element with the id most-popular. In order for the client to know this, these rules must somehow be defined in the media type or profile semantics. Here's the relevant markup again, so we know what we're talking about:

<a id="most-popular" href='/buy/most-popular' rel='payment'>Click here to buy our most popular product!</a>

For the next transition – .to('payment') – the link is actually activated and the resource it refers to (/buy/most-popular) is retreived through a simple GET. But why is the link activated with this particular transition, but not the previous one? The way I see it, this is essentially an implementation detail. I reckon whatever mechanics is used to understand HTML, or the relevant profiles here, should specify what should happen in different transitions. For the sake of this mockup, I've simply decided that our imaginary (well, with regards to the profiles at least) specifications declare that fragment identifiers shouldn't activate hypermedia controls, whereas link relations should.

In other words, despite the fragment being an anchor it isn't activated just because the first transition refers to it by id. The other transition does activate the link, because it refers to the link relation. The way I see it, link relations are the very definition of transitions and as such the hypermedia control it's associated with should be activated and the context should be set to the response of that request.

Because the last transition refers to a form name, again I've decided this should effectively work the same way as link relations. That is, because the transition refers to a form name, it should go ahead and submit that form. Had it referred to an identifier however, it wouldn't submit the form but rather just "go to" that part of the representation. The values of the form is specified by the call to the set function, before the last transition. This may seem counterintuitive, try reading it like this:

  1. Set some relevant values
  2. Transition to checkout

Essentially we're just saying that before checking out, we're providing our payment details so that our order is complete. Obviously, the HTML specification doesn't know what a "checkout" form is, so this would be something that's specified in the profile. I'll discuss set in further detail in the next section.

Because this function doesn't actually dictate how to form requests, it may be both unsafe and non-idempotent. For instance, the media type might define a POST request which is both unsafe and non-idempotent, but could just as well define a simple GET request which is both safe and idempotent. The user of this library doesn't and shouldn't have to know because that would then tie them to the server implementation. Rather, the library should deduce this from the profile(s), media type and/or protocol. If it can't, then there's no way the transition can happen anyway and an error should be raised – for HTTP it might be 400 Bad Request and ideally with an error document further explaining the problem.

Interface
function to(name) { return request }

The name parameter is some piece of application semantics, likely defined either in the media type or profile, but could also be defined in the protocol (consider the Location header for instance.) What transitioning to a given resource actually means in terms mechanics must be defined by whatever specification defines the name. If the name is defined in multiple specifications, there's a priority order:

  1. Profile(s) (in order of declaration)
  2. Media type
  3. Protocol

Potentially, just specifying a name of a transition might not be enough. Particularly if the name actually defines an embedded resource. Maybe a more sophisticated language is necessary; perhaps akin to how CSS selectors work. I don't know, which is why I chose to keep it simple.

set (function)

This function parameterizes the next state transition. This function itself doesn't actually ever make any requests, but simply stores data for later use. This is evident in the example above, where credit card data is stored for use with the checkout transition later. Data should be stored in simple key/value form, and the profile(s), media type or protocol should decide how it is serialized into a form that can then be transferred to a server. However, it's worth noting that the data stored – both key and value – may need to adhere to some format dictated by one of the aforementioned specifications. This won't be apparent however until the next state transition is made. Once a state transition happens, any data stored using this function should probably be cleared.

Interface
function set(object) { return request }
function set(key, value) { return request }

This function can be called in two ways; either with a key and a value, or with an object literal where the properties of the object define the key/value pairs.

on (function)

A client may want to handle different results, which is done using the on method. This is effectively an event handling mechanism, describing what to do when something happens. It should be possible to call on multiple times during a flow, and the callback should be invoked with the most recent response. This means that whatever api or to call prior to calling on that internally meant a request/response cycle will be used for the response parameter when invoking the callback.

Interface
function on(event, callback) { return request }

The event parameter describes the event to care about. I think this should be a query, such that a set of events can be handled using the same logic. This is examplified above by using the string '2xx' for instance, which would handle all successful HTTP responses. The format of the string should probably be dictated by whatever implements the given protocol, and I won't go in to much further detail on that here. However, the special value '*' should indicate that the specified callback should handle any event.

The callback parameter is the function that should be called when the event occurs. The signature of this function is:

function(response) {}

The response parameter is an object representing the response. Its signature is:

Response.url      // String
Response.code     // String
Response.status   // String
Response.headers  // Object; key is header name, value is header value
Response.body     // Resource representation; Content-Type header defines type

Response.url can be used to create a new API instance using the api function. This might be useful when chaining flows.

If there are multiple handlers specified for the same set of events, they will each be called in the order they were added.

go (function)

The go function acts as a terminator, a way for the library to know that right about now is a good time to go ahead and perform whatever work needs to be done to reach the end result. This may include making any number of requests.

Interface
function go() { return request }

Review

So how does all of this hold up to the list of features declared above? Let's go through each one and discuss.

Be hypermedia driven

The client only accepts URLs for two functions: api and profiles. The first function is the only required one, as it's the API entry point. It's not defined how this URL was retrieved; it may be well known or passed on from elsewhere. For profiles, the same thing applies; they might be well known URLs or passed on from elsewhere.

The client can't construct URLs, even if it wants to. Neither is it tied to any given protocol; even if the entry point is an http: URL, the connection may be upgraded to a different protocol. Obviously the library implements support for some protocols, but my point here is that the API should work for most any RESTful application protocol.

The client transitions from one state to another through the to method, but even then the user only provides semantic names to tell the client where to go. The actual mechanics of the transitions are defined by the profiles, media type and/or protocol. This way the client is fully parameterized, and shouldn't break even if the underlying formats or protocol changes – provided that the semantics remain intact that is!

Handle embedded resources

The client handles embedded resources using the to function, which can transition between representations as well as within representations. This means the server is free to embed resources as it wishes, so long as the application semantics are intact.

Be as synchronous as possible

This design goal is all about syntax. Of course, no call should block so that other unrelated processing has to wait. But callbacks lead to suffering so it should be reduced to as few spots as possible. This mockup is JavaScript, so unfortunately there's no getting around having callbacks if you wish to inspect an actual response. If you don't, it'll look synchronous but isn't; which may very well be confusing.

Other languages however may very well have features to better support coroutines and could possibly be used to implement nice, easy to follow synchronous looking code.

Be concise

The mockups are considerably more concise than the jQuery examples. That said, there may well be improvements that could be made. For instance, take the example of multiple subsequent transitions:

  .to('most-popular')
  .to('payment')

This could likely be elaborated further into a more concise syntax, perhaps something like the following:

  .to('most-popular -> payment')

However concise doesn't necessarily mean the least amount of characters, but rather being brief while still being comprehensive. Syntax like the above may prove to just be confusing more than helpful.

Be somewhat configurable

The client is configurable in a number of ways. The api function itself lets you configure the entry point, the profiles function lets you configure default profiles to fall back on. The media types and profiles configure application semantics to let the client know where it can go next.

There's another kind of configurability though, which isn't being discussed enough. In various places, I mention that the client could use different protocols, or different media types etc. But how does it do this? Is it just a monolithic implementation of a number of more or less standard protocols, media types and profiles and that's it?

Well, no.

I think there's the need to somehow configure the library. For instance, it may well come with HTTP support included, but not CoAP. At the time of this writing, that's still a draft protocol and while development seems active it's still not an established standard. That shouldn't stop users from being able to implement support for it and add it to the client.

The exact details and design of the client implementation is out of scope for this article, but suffice to say extensibility is key to make this a successful implementation of a generic programmable hypermedia client.

Don't be protocol/media type/profile specific

This is mostly discussed above under the hypermedia heading. Again, the client implementation should care to be designed such that protocols, media types and profiles are implemented in a modular fashion and ideally in a way that parts can be configured. Popular server side libraries such as express uses a kind of middleware design pattern that lets the user configure the way the server works. There are pros and cons of any approach of course, but again I think that's out of scope of this article.

However, what's important to note here is that the API of the client is not tied to any given protocol, media type or profile.

Don't be domain specific

As a generic programmable hypermedia client, it shouldn't need changes in order to use any specific API that prides itself on being RESTful. These APIs may not adhere to all of the REST constraints, but to the extent that they do this client should be able to understand them.

That said this client could very well be used as the internal mechanism of a domain specific library, and still provide significant value. Consider a store client library that has a function like this:

function purchaseMostPopularProduct(callback) {
  api('http://www.example.com/')
    // ... etc. etc.
    .on('*', callback)
}

Internally, it may actually be implemented like our case study above. Even if representations and URLs change, the client should still work. Yet, because this would be a library specific to a company or some such there's room to provide higher level syntax. Obviously, it will still break if application semantics change – such is life. Try to be nice to people and don't change application semantics too often.

In this sense I think a client like this sits somewhere in between low level libraries, that give you greater control over details such as crafting responses, and high level libraries that are essentially tailored to a specific domain.

Don't render the response body

I added this goal because I wanted to highlight again that this is an exploration of a generic library. As such, it shouldn't make assumptions of what you would like to do with the responses, beyond doing state transitions. You can see this in the case study above, where a whole flow is described but not until the very end of the flow do we do anything with the response body.

Of course during the flow there may be requests and responses sent back and forth, but these should not be rendered or otherwise have any assumptions made as to what they should be used for. If a user wants to do something with the response that isn't a state transition, they should use on to handle a response in whatever way they want. This hasn't been explored earlier, but on returns the request object so it could be used in the middle of flows as well. Let's rewrite our flow to do some processing of a request in the middle of the flow:

api('http://www.example.com/')
  .profiles(
    'http://www.fake-standards.org/shop'
  , 'http://www.fake-standards.org/shop/payment-instructions'
  )
  .to('most-popular')
  .on('200', function(response) {
    console.log('The most popular product is described at', response.url)
  })
  .to('payment')
  .set(
    { 'credit-card-number'        : '1234-5678-9012-3456'
    , 'credit-card-expiration'    : '01/16'
    , 'credit-card-security-code' : '987'
    }
  )
  .to('checkout')
  .on('200', function(response) {
    $('body').replaceWith($('body', response.body))
  })
  .go()

Without interrupting the flow, we're able to process the request and log some information to the console, before proceeding to the payment state. The response object in this case either came from the initial api call, since the subsequent to call didn't trigger a request/response cycle.

Closing remarks

I think part of the reason why hypermedia APIs aren't quiet taking off yet, is that they are much too difficult to consume. There's no real nice client side tooling to enable their usage, while still keeping enough distance so as not to break when media types change, URLs change or even the protocol changes. Until SkyNet becomes self aware we won't have clients that can just make sense of any kind of API and never break, but I think we can get a lot smarter.

Right now I feel the only tooling available is either too high level, usually company specific (e.g. Stripe); or too low level, such as jQuery's ajax functionality or even lower with XHR. I'm not knocking these things – they are really very good at what they do – but I feel there's a middle ground that's not being adequately explored and that may be part of the reason why we're not seeing widespread adoption of hypermedia enabled APIs.

There's another aspect of this which I haven't explored at all, but plan to do so in a follow up article, namely implementing servers that publish hypermedia APIs. While I think tooling on the server side is much better than on the client, I think there are definitely improvements to be made still.

Eventually, I hope to also conclude all of this by arguing that clients and servers don't actually have to be different machines on a network at all, but could very well just be different modules within the same application. In that sense, I think REST is an architectural style that can be useful in so many more contexts than "just" the internet.

To be continued.

@mstade
Copy link
Author

mstade commented Dec 9, 2013

Note to self: Make sure to straighten up some of the terminology in this document, it needs some editing.

  • Client: The programmable hypermedia client module.
  • Server: Some code somewhere that responds to request.
  • User: The user of the programmable hypermedia client module, i.e. a use case.

@mstade
Copy link
Author

mstade commented Dec 11, 2013

The client should probably have some capabilities of branching the path it takes. For instance, if it supports multiple vocabularies there may be a case where it could go in to one of two ways. The above example deals with payment using a credit card, but the user might prefer to pay with bitcoins and has programmed the client accordingly. Some thought needs to go in to how this might work, because it's very likely that there are multiple paths to get from start to finish in a flow (even loops!) and it probably makes sense to have the ability to program for such eventualities.

@mstade
Copy link
Author

mstade commented Dec 11, 2013

Another use case for branching is feature flags. It's becoming more and more popular to use feature flags, rather than try to implement wholesale versioning (through URLs or other mechanisms) which we used to do with offline installable applications. Even Fielding says you shouldn't version REST APIs, but rather employ a mechanism like feature flags. Here's how that might look in code:

if (server.fancyFeature) {
  // Do stuff with the new feature
  server.fancyFeature()
} else {
  // Do stuff boring old feature
  server.oldFeature()
}

Maybe the new feature is in beta, so not everyone has access. Because of this, the client needs the ability to switch. An older client, that only knows of the boring old feature, will still work fine so long as the API hasn't changed. After some time you may wish to ditch the old feature altogether; maybe you've let clients know that it's deprecated and will change, or you have metrics that suggest clients aren't using the old feature anymore. That's great, you ditch it and no clients are broken, because the ones using the fancy new feature already know to look for it. As well, the feature is long since out of beta, and everyone has access. In that case, the code above is mostly useless. It could be reduced to this instead:

server.fancyFeature()

While I don't necessarily think there's anything wrong with this pattern, I do feel that maybe there's a bit too much cognitive overhead to keep track of all the branching. It'd be interesting to see whether it'd be possible to code this in a different way that doesn't make you have to add a bunch of conditionals. I don't have an immediate suggestion, but it's worth thinking about.

@mstade
Copy link
Author

mstade commented Dec 21, 2013

A thought:

api(
  'http://www.example.com/tv-guide'
  , set({ name: 'Breaking Bad' })
  , to('search')
  , to('first')
  , on('200', function(tvshow) {
      // The response came back including the profile http://schema.org/TVSeries
      // The data property of the response object is a way to get microdata.
      console.log(tvshow.data.actor.name)
    })
)

This allows for an API that is fully extensible, where the functions passed to it are played in sequence. A problem with this approach is that it makes it harder to re-use partial flows, i.e. store away the state as it is after to('search') and use it to create another flow. Needs further thought, but I kind of like the idea of composing things like this.

I don't know ClojureScript well enough, but I reckon this is about right?

(api "http://www.example.com/tv-guide"
  (set { :name "Breaking Bad" })
  (to "search")
  (to "first")
  (on 200 (fn [tvshow]
    ;; The response came back including the profile http://schema.org/TVSeries
    ;; The data property of the response object is a way to get microdata.
    (.log js/console (aget tvshow "data" "actor" "name")))))

@sammyt
Copy link

sammyt commented Dec 21, 2013

So much good thinking here Macrus!

A quick note on functional composition, at least how I tend to think about
it (this could be awful advise).

In the above example api would have to be a very clever function that
knew how to take its varargs and execute then accordingly and at the right
time. It might be worth making it more stupid, then knitting the API together
with some functional tools.

Here is an example in clojurescript. Each element of the API is itself
totally stupid and mostly unaware of the rest of its brethren functions.
Used as-is (example one) they look pretty gritty! But add the threading
macro and Ta-Da you have an API.

;; some example functions that dont do anyting
(defn api 
  ([root] (api root []))
  ([root profiles] {:root root :profiles profiles}))

(defn to 
  [state direction]
  {:to direction
   :from state})

(defn set 
  [state props]
  (assoc state :props props))

;; using those functions out of the box
(def one
  (to 
    (set 
      (to 
        (to 
          (api "http://www.ziazoo.co.uk" ["/shop" "/shop/pay"]) 
          "most-popular") 
        "payment") 
      {:credit-card-number  "1234-5678-1234-5678" 
       :credit-card-expires "01/16" 
       :credit-card-code    "987"}) 
    "checkout")) 


;; add a little magic sauce
(def two
  (-> (api "http://www.ziazoo.co.uk" ["/shop" "/shop/pay"])
      (to "most-popular")
      (to "payment")
      (set {:credit-card-number  "1234-5678-1234-5678"
            :credit-card-expires "01/16"
            :credit-card-code    "987"})
      (to "checkout")))


(print one)
(print two)
(print (= one two)) ;; true

Adding async operation would be equally as trivial, using core.async.

Much of the same narrative can be achieved in javascript as well. As that
link you pinged me last night demos :)

Hope my ramblings are helpful, sam

@mstade
Copy link
Author

mstade commented Dec 21, 2013

Thanks @sammyt, invaluable feedback as always! Is the magic sauce in this case core.async?

I'm not sure api() has to be that magical or particularly smart. I figured it'd work something like task.js, but instead of a single task it's a list of transition functions. Maybe I'm just not biting the functional apple yet (very likely) but I don't see how this:

(-> (api "http://www.ziazoo.co.uk" ["/shop" "/shop/pay"])
      (to "most-popular")
      (to "payment")
      (set {:credit-card-number  "1234-5678-1234-5678"
            :credit-card-expires "01/16"
            :credit-card-code    "987"})
      (to "checkout")))

... is any preferable to this:

(api "http://www.ziazoo.co.uk" ["/shop" "/shop/pay"]
      (to "most-popular")
      (to "payment")
      (set {:credit-card-number  "1234-5678-1234-5678"
            :credit-card-expires "01/16"
            :credit-card-code    "987"})
      (to "checkout"))

The point of api is to compose transitions within a certain context, i.e. the URL. Not sure why having this higher level of abstraction is less ideal than a generic function composer thingie. (I'm sure there is a very good reason, I'm just not smart enough to see it.

@mstade
Copy link
Author

mstade commented Dec 21, 2013

Today I found Traverson – a library for consuming hypermedia APIs, not unlike some of the ideas I've posted here. Worth keeping an eye on! Some examples are shown in these slides.

@pemrouz
Copy link

pemrouz commented Jan 30, 2014

Great work @mstade! Looks like there's a lot going on here. Some random thoughts at 5AM :). I'll try to stick to the client aspect, and leave discussion of the rest (specs, formats, profiles, etc) for elsewhere.

Declarative / Imperative

This is a great example of what I meant by "linear stories". You touch on the branching issue above, discuss conditionals and inlining operations as parameters to a function. But really I think these are all the same. It's all fairly imperative, like jQuery. You tell it what to do line-by-line. This could probably be made more powerful by taking a leaf out of D3's books instead.

The key thing I think you are missing is perhaps a notion of universals. Right now your vocab only knows particulars:

api('http://www.example.com/film/Matrix')
  .to('actor')
  .set({rating: 5})

That'll only ever get the first actor. What if there were more than one? Let's say you did add an index offset to the .to() function:

api('http://www.example.com/film/Matrix')
  .to('actor', 1)
  .set({rating: 5})
  .to('rate')

api('http://www.example.com/film/Matrix')
  .to('actor', 2)
  .set({rating: 5})
  .to('rate')

api('http://www.example.com/film/Matrix')
  .to('actor', 3)
  .set({rating: 5})
  .to('rate')

You'd more likely put that in a loop, or at least cache the first lines:

var matrix = api('http://www.example.com/film/Matrix')
for (var i = 0; i < paragraphs.length; i++) {
  matrix.to('actor', i).set({rating: 5}).to('rate')
}

That's leaning on the Vanilla side, but the processing abilities are still fundamentally limited. You'll be marching to the right nesting loops and juggling stored variables proportional to the complexity of the application. It's like playing twister, but growing an extra arm every time you resort to a call to .to(). Might sound good to begin with, but turns to spaghetti very quickly. Something like this might be better:

api('http://www.example.com/film/Matrix')
  .all('actor')
  .set({rating: 5})
  .to('rate')

Going a step further, you could do a lot with consonants, but functions of data have a surprisingly much heavier power-to-weight ratio:

api('http://www.example.com/film/Matrix')
  .all('actor')
  .set("rating", function(d){ return d.filmsPlayed / d.yearsOfExperience })
  .to('rate')

The feature flags could also be achieved in a similar manner (i.e. reduction to a subset of the initial selection) so that conditionals aren't required or the method chaining broken.

Async Chaining

What happens if you want to keep "flow"-ing from an event handler? In your example:

api('http://www.example.com/tv-guide')
  .set({ name: 'Breaking Bad' })
  .to('search')
  .to('first')
  .on('200', function(tvshow) {
    // The response came back including the profile http://schema.org/TVSeries
    // The data property of the response object is a way to get microdata.
    console.log(tvshow.data.actor.name)

  /* api(tvshow.url).to...? */
  /* tvshow.to...? */
  })

Function Composer vs Abstraction

I'm illiterate in Clojure so I can't really comment on Sam's suggestion. But I think the gist is that abstraction can be a pain, generating a web of hard links between code units (read: spaghetti). Whereas if the parts are more loosely decoupled (each one blind to other parts), it makes the whole thing a bit more scaleable in the long-term.

Dot-Notation

If you want to maximise the synchronicity, would the dot-notation similar to Jon Moore's example be better for that?

Non-HTML Scope

How much scope do you think there is in making this an abstract library that transcends HTML? You mention CoAP, but is that good enough to warrant taking this a level higher?

Misc

  • HTML Profiles got taken out. Are they back in HTML5? (I know it doesn't directly affect this client..)
  • Have you seen the ALPS before?

Right. Let's see if I can go to sleep now..

@koddo
Copy link

koddo commented Aug 31, 2015

Hello. Why do you separate the set and to functions? Why .set(params).to(rel) instead of .to(rel, params)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment