Skip to content

Instantly share code, notes, and snippets.

@seagreen
Forked from gelisam/exchange-formats.md
Created April 2, 2017 05:30
Show Gist options
  • Save seagreen/f68a5ccd5505bab6ae7457a8f50ba022 to your computer and use it in GitHub Desktop.
Save seagreen/f68a5ccd5505bab6ae7457a8f50ba022 to your computer and use it in GitHub Desktop.
A list of every data exchange formats I could find

At work, I just spent the last few weeks exploring and evaluating every format I could find, and my number one criteria was whether they supported sum types. I was especially interested in schema languages in which I could describe my types and then some standard specifies how to encode them using an on-the-wire format, usually JSON.

  1. Swagger represents sum types like Scala does, using subtyping. So you have a parent type EitherIntString with two subtypes Left and Right represented as {"discriminator": "Left", value: 42} and {"discriminator": "Right", value": "foo"}. Unfortunately, unlike in Scala in which the parent type is abstract and cannot be instantiated, in Swagger it looks like the parent type is concrete, so when you specify that your input is an EitherIntString, you might receive {"discriminator": "EitherIntString"} instead of one of its two subtypes.
  2. JSON-schema supports unions, which isn't quite the same thing as sum types because they are untagged: with Either String String you can distinguish between a Left and a Right string but with String | String you can't. Still, you can use JSON-schema to describe Left and Right as {"discriminator": "Left", value: 42} and {"discriminator": "Right", value": "foo"}, as above, and then define the type EitherIntString as the union of Left and Right.
  3. Avro supports tagged unions, but doesn't allow them to be named, so you could have a record with a field of type Either Int String but not a top-level type data MyEither = MyLeft Int | MyRight String.
  4. Thrift, Capt'n Proto, ASN.1, and CORBA all support tagged unions, but each constructor needs to have exactly one parameter, so you'd need to write your types in the form data MaybeInt = Nothing () | Just Int and data TreeInt = Leaf Int | Branch (TreeInt, TreeInt) instead of data MaybeInt = Nothing | Just Int and data TreeInt = Leaf Int | Branch TreeInt TreeInt.
  5. MSON and Protocol buffer allow you to list mutually-exclusive properties in an object, so you could allow {left: 42} and {right: "foo"} and disallow {left: 42, right: "foo"}, but since properties are optional, you'd have to also allow {}.
  6. RAML (which I ended up choosing for work) supports both tagged and untagged unions, but version 1.0 of the spec is still very new so I haven't found any tool support for it and we'll be rolling our own.
  7. edn, MessagePack and transit-format don't have a schema language, so while you could use (:left 42) and (:right "foo") to represent values of type Either Int String, you'd have to also allow (:flubber 1.0) and nil and every other valid edn value. You could of course roll your own schema language; it just so happens that all the existing schema languages I found are targeting JSON (or their own binary format) instead.
  8. AMQP's schema language supports records but not sum types. I expected most formats to fall into that category, but I was pleasantly surprised to see that this was not the case.

There were a few other formats I wanted to look at, but their specifications are very heavyweight and I gave up before finding out whether they support unions or not: RSDL, Open Data Protocol, WADL, WSDL, and XML-RPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment