At work, I just spent the last few weeks exploring and evaluating every format I could find, and my number one criteria was whether they supported sum types. I was especially interested in schema languages in which I could describe my types and then some standard specifies how to encode them using an on-the-wire format, usually JSON.
- Swagger represents sum types like Scala does, using subtyping. So you have a parent type
EitherIntString
with two subtypesLeft
andRight
represented as{"discriminator": "Left", value: 42}
and{"discriminator": "Right", value": "foo"}
. Unfortunately, unlike in Scala in which the parent type is abstract and cannot be instantiated, in Swagger it looks like the parent type is concrete, so when you specify that your input is anEitherIntString
, you might receive{"discriminator": "EitherIntString"}
instead of one of its two subtypes. - JSON-schema supports unions, which isn't quite the same thing as sum types because they are untagged: with
Either String String
you can distinguish between a Left and a Right string but withString | String
you can't. Still, you can use JSON-schema to describeLeft
andRight
as{"discriminator": "Left", value: 42}
and{"discriminator": "Right", value": "foo"}
, as above, and then define the typeEitherIntString
as the union ofLeft
andRight
. - Avro supports tagged unions, but doesn't allow them to be named, so you could have a record with a field of type
Either Int String
but not a top-level typedata MyEither = MyLeft Int | MyRight String
. - Thrift, Capt'n Proto, ASN.1, and CORBA all support tagged unions, but each constructor needs to have exactly one parameter, so you'd need to write your types in the form
data MaybeInt = Nothing () | Just Int
anddata TreeInt = Leaf Int | Branch (TreeInt, TreeInt)
instead ofdata MaybeInt = Nothing | Just Int
anddata TreeInt = Leaf Int | Branch TreeInt TreeInt
. - MSON and Protocol buffer allow you to list mutually-exclusive properties in an object, so you could allow
{left: 42}
and{right: "foo"}
and disallow{left: 42, right: "foo"}
, but since properties are optional, you'd have to also allow{}
. - RAML (which I ended up choosing for work) supports both tagged and untagged unions, but version 1.0 of the spec is still very new so I haven't found any tool support for it and we'll be rolling our own.
- edn, MessagePack and transit-format don't have a schema language, so while you could use
(:left 42)
and(:right "foo")
to represent values of typeEither Int String
, you'd have to also allow(:flubber 1.0)
andnil
and every other valid edn value. You could of course roll your own schema language; it just so happens that all the existing schema languages I found are targeting JSON (or their own binary format) instead. - AMQP's schema language supports records but not sum types. I expected most formats to fall into that category, but I was pleasantly surprised to see that this was not the case.
There were a few other formats I wanted to look at, but their specifications are very heavyweight and I gave up before finding out whether they support unions or not: RSDL, Open Data Protocol, WADL, WSDL, and XML-RPC.