Skip to content

Instantly share code, notes, and snippets.

@ponelat
Last active November 29, 2024 07:36
Show Gist options
  • Save ponelat/3aeb1c9e3181a293cb941ac5e8062237 to your computer and use it in GitHub Desktop.
Save ponelat/3aeb1c9e3181a293cb941ac5e8062237 to your computer and use it in GitHub Desktop.
JSON Pointer Query (JPQ)

JSON Pointer Query - RFC

A query language aimed at identifying locations within a JSON/YAML document. Inspired by JSON Pointer. This is heavily influenced by JSON Path of which it shares maybe 80% of the syntax. It didn't start of sharing that much but has grown towards JSON Path, as problems were solved.

The design goals

  • Only identify locations in a JSON-like document.
  • Only aim for 80% of use cases and leave more powerful features to other query languagse.
  • Be specified (ie: versioned and complete)
  • Do not rely on implementation details

Other query languages worth mentioning are JMESPath and jq. Both of which have different goals (as I've interpreted them) to aim for data transformation as opposed to identifying data locations.

A quick example

// Document
paths:
  /foo:
    get: 
      responses: { 400: {} }
    post: 
      responses: { 200: {} }

// Query
$.paths.*.($.responses.200)

// Output
#/paths/~1foo/post

JSON Pointer and segments

Initially JPQ wanted to extend the syntax of JSON Pointer, using slashes to separate segments. Where each segment is an instruction on how to navigate the document.

JSON Pointer

// Document
one:
  two: 3
  
// Query
#/one/two

// Output
<The value 3>

Where # represents the current document. It could be prefixed with a URI to represent a canonical location, ie: https://example.com/#one/two, but for the purposes of this RFC, we'll leave out canonical queries.

When we split up the JSON Pointer we get #, one and two. The segments one tell us to navigate some "pointer" to the value side of a key named one. This is now where we are currently pointing, we then look at the next segment, two which tells us to now navigate deeper to the value side of the key named two. Ultimately arriving a the value 3.

Each segment tells us how to navigate further. That is the gist of it. For the rest of the document we'll describe the output in terms of JSON Pointer paths, which are easier to refer to than values. And keep in mind this document is interesting in identifying locations, not providing the values at those locations.

Segment separators

JSON Pointer uses / as a segment separator. When a literal / is needed, you can use the escape sequence ~1 to represent that slash.

// Document
paths:
  /foo:
    get: {}
	
// JSON Pointer
#/paths/~1foo/get

// Output
<the value of an empty object>

And if you need a literal ~ you can use ~0. These are the only escape sequences necessary for JSON Pointer, which makes it rather elegant.

For JPQ we started by continuing to use / but realised that a lot of the queries would involve strings that included slashes in them (ie: OpenAPI paths). So we then changed to . which is what JSON Path uses.

one.two.three

Much like JSON Path we also used $ to represent the root of the document. So a querym ay look like this...

// Document
paths:
  /foo:
    get: {}
	
// Query
$.paths./foo.get

// Output
#/paths/~1foo/get

To escape a . we opted for the same strategy as JSON Pointer, and used a tilde sequence ~1.

So

// Document
paths:
  /swagger.json:
    get: {}
	
// Query
$.paths./swagger~1json.get

// Output
#/paths/~1swagger.json/get

Wildcards

We now have support for escaping special characters (ie: tilde + single digit). Let's do some cool stuff with it.

The first that comes to mind is a wildcard character to represent any single-level traversal.

// Document
paths:
  /foo:
    get: {}
      post: {}
  /bar:
    get: {}

// Query
$.paths.*.*

// Output
#/paths/~1foo/get
#/paths/~1foo/post
#/paths/~1bar/get

That gives us a lot of power to select things we know about today, and to select new additions to the document in the future, without changing the query!

Nested query (or look aheads)

When it comes to queries we're often interested in the parent location, where some children satisfy a nest query. Our approach is to wrap the nested query with parenthesis (escape sequences ~3 for ( and ~4 for )). We've opted against separate characters for absolute and relative. So a nested query is fully valid JPQ and applies to the value traversed up till that point. This is noted as JSON Path uses @ for relative queries.

// Document
paths:
  /foo:
    get: 
      responses: { 400: {} }
    post: 
      responses: { 200: {} }

// Query
$.paths.*.($.responses.200)

// Output
#/paths/~1foo/post

Note that the path returned is not the one pointing to the 200 but instead the path that leads up to the key where the nested query segment is.

These two queries are equivelant...

// Document
one:
  two: 
    three: 3
  
// Query
$.one.two
or
$.one.*
or 
$.one.($.three)

// Output
#/one/two

One mental model is in thinking that each segment must resolve to a key.

Array contains

We often need to query whether an array contains a value. We opted for a shorthand that meets looks most elegant. Although time will tell if its comes at significant tradeoff.

// Document
paths:
  /foo:
    get: 
      tags: [dog]
    post:
      tags: [dog, cat, dog]
	  
// Query
$.paths.*.*.tags.[dog]

// Output
#/paths/~1foo/get/tags/0
#/paths/~1foo/post/tags/0
#/paths/~1foo/post/tags/2

You'll note that a path to each item in the array is returned.

We can combine this with nested queries to solve for the case of finding an operation that has a given tag...

// Document
paths:
  /foo:
    get: 
      tags: [dog]
    post:
      tags: [cat]
	  
// Query
$.paths.*.($.tags.[dog])

// Output
#/paths/~1foo/get

Note to selves: I can see a common mistake being $.paths.*.*.($.tags.[dog]) (an extra wildcard) compared to the valid $.paths.*.($.tags.[dog]). Not sure if it's possible to make it more obvious.

Making the wildcard into a glob pattern

We've spoken of the wildcard *, what we can do for the cases where we need to do string matching is to extend this to be a glob pattern. Where you can mix characters with the * to search for prefixes and suffixes.

// Document

paths:
  /foo: 
    get:
      responses: { 400: {} }
    post:
      responses: { 200: {}, 201: {} }
	  
// Query
$.paths.*.*.responses.2*

// Output
#/paths/~1foo/post/responses/200
#/paths/~1foo/post/responses/201

At this point we only have * but we're interested in considering the glob pattern used in Linux/macOS/BSD systems, with characters such as ? and [.

Logic combinators (and/or/not)

To create more powerful queries we can leverage logical combinators such as && (and) || (or) and ! (not). We considered prefix-notation (like a lisp) but instead opted for infix (like most programming languages) for it's readability.

// Document
paths:
  /foo:
    get:
      responses: { 401: {} }
    post:
      responses: { 400: {}, 401: {} }
    put:
      responses: { 200: {} }
	  
// Query
$.paths.*.($.responses.4* && !($.responses.400))

// Output
#/paths/~1foo/get

The above query reads "It has a 4* response but not a 400".

(primitive) Value matching

There is a good case for testing if a primitive value (string, bool, number) meats a criteria. For that we introduce the predicate expressions, where we compare two values using c-styled expressions, =, >, <, etc.

This needs to be fleshed out more, but the gist should be clear.

// Document
components:
  schemas:
    Foo:
      type: string
    Bar:
      type: string
    Baz: 
      type: number

// Query
$.components.schemas.($.type = "string")

// Output
#/components/schemas/Foo
#/components/schemas/Bar

Note: Will only compare primitives, not objects nor arrays. Note: Will do no type conversions, so string will never match a number.

Won't match non-existing paths

Only paths that exist will be returned. Illustrated by this example...

// Document
components:
  schemas:
    Foo:
      type: object
    Bar:
      type: object
      properties: {}

// Query
$.components.schemas.($.type = "object").properties

// Output
#/components/schemas/Bar/properties

An argument could be made for returning a path based on matched expressions, but for now in the spirit of simplicity, we've opted to only return paths that exist in the document.

Variables

Variables will likely be used by tool makers. It is good practice to have a way of providing support for them in a safe way, with string escaping. So as to avoid string injection attacks (not sure of security impact, but good to be prepared). We can use the @ character to resprent an escaped string.

Example:

// Document
paths:
  /foo:
    get: 
      tags: [dog]
    post:
      tags: [dog, cat]
    put:
      tags: [elephant]

// Query
$.paths.*.($.tags.[@petType])

// Variables
petType: dog

// Output
#/paths/~1foo/get
#/paths/~1foo/post

Out of band ideas

Descent wildcard

To match the glob pattern, it's possible that we add ** to recursively descend and return every path that matches.

// Document
components:
  schemas:
    Foo: {}
    Bar: {}
    Baz: 
      type: object

// Query
$.components.**

// Output
#/components/schemas
#/components/schemas/Foo
#/components/schemas/Bar
#/components/schemas/Baz
#/components/schemas/Baz/type

Note the inclusion of #/components/schemas and #/components/schemas/Baz/type, both of which may be weird to search for.

@guettli
Copy link

guettli commented Nov 29, 2024

@ponelat I am looking for an alternative to JsonPointer, and found your page. What is the status? What do you use today?

I did some research, and I think I will use that: https://jmespath.org/ it is supported by several languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment