JSON Schema is a versatile tool for defining the structure of JSON data and ensuring its validation. However, as powerful as it is, complex scenarios can sometimes lead to paradoxical constraints, especially when used in combination with code generation tools. In this article, we'll take an in-depth look at one such paradox that emerged while defining message structures for various protocols and discuss a practical solution.
Imagine a system where messages are passed using different protocols, such as AMQP, HTTP, MQTT, Kafka, and CloudEvents. Each protocol has a distinct message structure but shares certain common attributes. These shared attributes are consolidated in a base message definition, called definition
in our JSON Schema.
The definition
schema could look something like this:
"definition": {
"type": "object",
"description": "a message definition",
"properties": {
"schemaUrl": {
"type": "string",
"description": "A URL to the schema of the message's data.",
"format": "uri-reference"
},
"schemaFormat": {
"type": "string",
"description": "Declares the schema format"
},
"format": {
"type": "string",
"description": "Specifies the `format` of this definition."
},
"metadata": {
"type": "object"
}
},
"required": [
"format"
],
...
}
Protocol-specific definitions, such as amqpDefinition
, mqttDefinition
, etc., are created, inheriting properties from the base definition
via the allOf
keyword:
"amqpDefinition": {
"type": "object",
"properties": {
"metadata": {
"description": "AMQP message metadata",
"$ref": "#/definitions/amqpMetadata"
},
"format": {
"type": "string",
"description": "Specifies the `format` of this definition.",
"enum": ["AMQP", "AMQP/1.0"]
}
},
"required": [
"metadata", "format"
],
"allOf": [
{
"$ref": "...#/definitions/definition"
}
]
}
To manage a polymorphic dictionary of such definitions, a type with additionalProperties
was defined. This definitions
type is designed to accept additional properties that match either the base definition
or any of the protocol-specific definitions.
"definitions": {
"type": "object",
"title": "definitions",
"description": "A collection of Message Definitions.",
"additionalProperties": {
"oneOf": [
{"$ref": "#/definitions/definition"},
{"$ref": "...amqpDefinition"},
{"$ref": "...mqttDefinition"},
...
]
}
}
But this setup led to a paradox. If an instance matched one of the specific definitions, due to inheritance, it would also match the base definition
. This violates the oneOf
constraint, which states that exactly one of the schemas should match. Further, the base definition
had to stay in the dictionary to cater to the requirements of certain code generation tools. Tools like NSwag pick the first entry of a oneOf
list to determine the element type of a collection.
The paradox is that any instance conforming to one of the specific definitions also conforms to the base definition
, because the base definition
is a superset of the specific definitions. This leads to a violation of the oneOf
constraint, creating a conflict between the needs of data validation and code generation.
To resolve this paradox, we had to restructure the JSON Schema to ensure an instance cannot match both the base definition
and any of the concrete definitions simultaneously. We created a new concreteDefinitions
dictionary that houses all the specific definitions but not the base definition
. Then, we revised the definitions
object to implement a two-level check. For its additionalProperties
, it first checks whether the object matches either the base definition
or any of the concreteDefinitions
. If that fails, it checks again if it matches any concreteDefinitions
. This ensures that the base definition
is matched only if none of the concreteDefinitions
matches.
"concreteDefinitions": {
"oneOf": [
{"$ref": "...amqpDefinition"},
{"$ref": "...mqttDefinition"},
...
]
},
"definitions": {
"type": "object",
"additionalProperties": {
"oneOf": [
{
"oneOf": [
{"$ref": "#/definitions/definition"},
{"$ref": "#/definitions/concreteDefinitions"}
]
},
{"$ref": "#/definitions/concreteDefinitions"}
]
}
}
This case study underlines the flexibility and versatility of JSON Schema, while simultaneously highlighting the complexities that can emerge when a single schema is expected to serve multiple purposes. Although JSON Schema is a potent tool in defining and validating data structures, applying it in complex scenarios can be challenging. Hence, careful and strategic structuring of the schema is paramount to ensure