Skip to content

Instantly share code, notes, and snippets.

@gelisam
Last active September 30, 2023 17:50
Show Gist options
  • Save gelisam/13d04ac5a54b577b2492785c1084281f to your computer and use it in GitHub Desktop.
Save gelisam/13d04ac5a54b577b2492785c1084281f to your computer and use it in GitHub Desktop.
A list of every data exchange formats I could find

At work, I just spent the last few weeks exploring and evaluating every format I could find, and my number one criteria was whether they supported sum types. I was especially interested in schema languages in which I could describe my types and then some standard specifies how to encode them using an on-the-wire format, usually JSON.

  1. Swagger represents sum types like Scala does, using subtyping. So you have a parent type EitherIntString with two subtypes Left and Right represented as {"discriminator": "Left", value: 42} and {"discriminator": "Right", value": "foo"}. Unfortunately, unlike in Scala in which the parent type is abstract and cannot be instantiated, in Swagger it looks like the parent type is concrete, so when you specify that your input is an EitherIntString, you might receive {"discriminator": "EitherIntString"} instead of one of its two subtypes.
  2. JSON-schema supports unions, which isn't quite the same thing as sum types because they are untagged: with Either String String you can distinguish between a Left and a Right string but with String | String you can't. Still, you can use JSON-schema to describe Left and Right as {"discriminator": "Left", value: 42} and {"discriminator": "Right", value": "foo"}, as above, and then define the type EitherIntString as the union of Left and Right.
  3. Avro supports tagged unions, but doesn't allow them to be named, so you could have a record with a field of type Either Int String but not a top-level type data MyEither = MyLeft Int | MyRight String.
  4. Thrift, Capt'n Proto, ASN.1, and CORBA all support tagged unions, but each constructor needs to have exactly one parameter, so you'd need to write your types in the form data MaybeInt = Nothing () | Just Int and data TreeInt = Leaf Int | Branch (TreeInt, TreeInt) instead of data MaybeInt = Nothing | Just Int and data TreeInt = Leaf Int | Branch TreeInt TreeInt.
  5. MSON and Protocol buffer allow you to list mutually-exclusive properties in an object, so you could allow {left: 42} and {right: "foo"} and disallow {left: 42, right: "foo"}, but since properties are optional, you'd have to also allow {}.
  6. RAML (which I ended up choosing for work) supports both tagged and untagged unions, but version 1.0 of the spec is still very new so I haven't found any tool support for it and we'll be rolling our own.
  7. edn, MessagePack and transit-format don't have a schema language, so while you could use (:left 42) and (:right "foo") to represent values of type Either Int String, you'd have to also allow (:flubber 1.0) and nil and every other valid edn value. You could of course roll your own schema language; it just so happens that all the existing schema languages I found are targeting JSON (or their own binary format) instead.
  8. AMQP, XML-RPC, and Bond support records but not sum types. I expected most formats to fall into that category, but I was pleasantly surprised to see that this was not the case.
  9. WADL and WSDL both use XSD to specify the XML schema. Sum types are supported via alternative, here is an example. RSDL supports both XSD and JSON-schema.
  10. extprot supports sum types directly.

There is one last format I wanted to look at, Open Data Protocol, but its specification is very heavyweight and I gave up before finding out whether it supports unions or not.

@seagreen
Copy link

Microsoft's Bond is an interesting schema language (and it's partly written in Haskell!). As far as I can tell it doesn't support sum types though.

@gelisam
Copy link
Author

gelisam commented May 14, 2018

Updated, thanks.

@timbod7
Copy link

timbod7 commented Dec 23, 2019

The ADL schema language fully supports sum types, both monomorphic and generic.

@steshaw
Copy link

steshaw commented Apr 4, 2020

The following are zero-overhead encode/decode designs somewhat similar to Cap’n Proto:

  • FlatBuffers — influenced by the author's experience with client/server computer game systems.
  • GHC-specific compact regions. Really sweet! Includes all Haskell data structures—including sum types—with some caveats.

To my mind, all exchange formats — particularly wire formats — should be zero-overhead encode/decode. Other formats (such as JSON or XML) can be generated for long term storage or other purposes.

@steshaw
Copy link

steshaw commented Apr 4, 2020

There's also Typedefs which supports sum types (and more). I don't know anything about the performance characteristics.

https://typedefs.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment