Skip to content

Instantly share code, notes, and snippets.

@olih
Last active November 4, 2024 05:44
Show Gist options
  • Save olih/f7437fb6962fb3ee9fe95bda8d2c8fa4 to your computer and use it in GitHub Desktop.
Save olih/f7437fb6962fb3ee9fe95bda8d2c8fa4 to your computer and use it in GitHub Desktop.
jq Cheet Sheet

Processing JSON using jq

jq is useful to slice, filter, map and transform structured json data.

Installing jq

On Mac OS

brew install jq

On AWS Linux

Not available as yum install on our current AMI. It should be on the latest AMI though: https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/

Installing from the source proved to be tricky.

Useful arguments

When running jq, the following arguments may become handy:

Argument Description
--version Output the jq version and exit with zero.
--sort-keys Output the fields of each object with the keys in sorted order.

Basic concepts

The syntax for jq is pretty coherent:

Syntax Description
, Filters separated by a comma will produce multiple independent outputs
? Will ignores error if the type is unexpected
[] Array construction
{} Object construction
+ Concatenate or Add
- Difference of sets or Substract
length Size of selected element
| Pipes are used to chain commands in a similar fashion than bash

Dealing with json objects

Description Command
Display all keys jq 'keys'
Adds + 1 to all items jq 'map_values(.+1)'
Delete a key jq 'del(.foo)'
Convert an object to array to_entries | map([.key, .value])

Dealing with fields

Description Command
Concatenate two fields fieldNew=.field1+' '+.field2

Dealing with json arrays

Slicing and Filtering

Description Command
All jq .[]
First jq '.[0]'
Range jq '.[2:4]'
First 3 jq '.[:3]'
Last 2 jq '.[-2:]'
Before Last jq '.[-2]'
Select array of int by value jq 'map(select(. >= 2))'
Select array of objects by value ** jq '.[] | select(.id == "second")'**
Select by type ** jq '.[] | numbers' ** with type been arrays, objects, iterables, booleans, numbers, normals, finites, strings, nulls, values, scalars

Mapping and Transforming

Description Command
Add + 1 to all items jq 'map(.+1)'
Delete 2 items jq 'del(.[1, 2])'
Concatenate arrays jq 'add'
Flatten an array jq 'flatten'
Create a range of numbers jq '[range(2;4)]'
Display the type of each item jq 'map(type)'
Sort an array of basic type jq 'sort'
Sort an array of objects jq 'sort_by(.foo)'
Group by a key - opposite to flatten jq 'group_by(.foo)'
Minimun value of an array jq 'min' .See also min, max, min_by(path_exp), max_by(path_exp)
Remove duplicates jq 'unique' or jq 'unique_by(.foo)' or jq 'unique_by(length)'
Reverse an array jq 'reverse'
@chb0github
Copy link

chb0github commented May 3, 2024 via email

@bouchezi
Copy link

bouchezi commented May 3, 2024

Yes, I tried to use that one, but I can't figure out how to go from an array to a map and remove "tableReference"
Cause here I want to group on the key, not the value

@bouchezi
Copy link

bouchezi commented May 3, 2024

Got it working using this: jq '[{tableId: map(.tableReference.tableId)}] | add'

@chb0github
Copy link

Having looked at your original answer request (tableIds: note the plural) and the solution you have accepted for yourself, group_by would never work. It's general purpose. I mean, technically, if you're gonna skip the key and hard code it, you don't need a map - you know the array values represent what you need.

group_by is definitely a difficult function to work with as, IMO, it doesn't produce and intuitive output

JSON for how it's meant to work, along with output. I added an id field so we could clearly distinguish each object.

[
  {
    "id" : 1,
    "tableReference": {
      "tableId": "Applicant"
    }
  },
  {
    "id" : 3,
    "tableReference": {
      "tableId": "ApplicantBureau"
    }
  },
  {
    "id" : 2,
    "tableReference": {
      "tableId": "Applicant"
    }
  },
  {
    "id" : 4,
    "tableReference": {
      "tableId": "Foo"
    }
  }
]

jq:

group_by(.tableReference.tableId) | 
    map({ 
        key: .[0].tableReference.tableId, value: [.[] | .] 
    }) | 
from_entries

Output:

{
  "Applicant": [
    {
      "id": 1,
      "tableReference": {
        "tableId": "Applicant"
      }
    },
    {
      "id": 2,
      "tableReference": {
        "tableId": "Applicant"
      }
    }
  ],
  "ApplicantBureau": [
    {
      "id": 3,
      "tableReference": {
        "tableId": "ApplicantBureau"
      }
    }
  ],
  "Foo": [
    {
      "id": 4,
      "tableReference": {
        "tableId": "Foo"
      }
    }
  ]
}

You'll notice you now have 1 map with the key being the tableId? You should actually post to stack over flow - you're more likely to get help since people can get credit

@chb0github
Copy link

@tmprender - can you post your working example? The one on stack overflow doesn't go full depth

@chb0github
Copy link

chb0github commented May 9, 2024

So, I am trying to do the above @bouchezi , and it should be about this simple:

jq '[path(..)] | [map(join(".")), map( getpath(.))] ' example.json

When I do this:
jq [path(..)] | .[1:3]
I get

[
  [
    "data"
  ],
  [
    "data",
    "object"
  ]
]

and when I fetch the second element, as an example, I get:

jq:

jq 'getpath(  [     
    "data",
    "object"
  ])' example.json

produces:

{
  "user": {
    "id": 1,
    "range": [
      -255,
      0,
      255
    ],
    "notation": "big-O",
    "details": {
      "lat": 0.000,
      "long": 0.000,
      "time": 42
    }
  },
  "groups": [
    {
      "id": 2,
      "name": "foo"
    },
    {
      "id": 3,
      "name": "bar"
    }
  ]
}

So, the output from [path(..)] looks just fine.

When I do:

jq '[path(..)] | map(join(".")) example.json'

I get what I would expect (so far, so good) (sample):

[
 "data.object.user",
  "data.object.user.id",
  "data.object.user.range",
  "data.object.user.range.0",
  "data.object.user.range.1",
  "data.object.user.range.2",
  "data.object.user.notation"
]

But when I do this:

jq '[path(..)] | map(getpath(.))' example.json

I get the error:

jq: error (at example.json:27): Cannot index array with string "data"

This should work. Because if I can get this to work, I think the whole problem can be reduced to:

jq -re '[path(..)] | [map(join(".")),map(getpath(.))] | map(join("=")) | .[]'

Simplest example:

-> % jq '[path(..)] | .[3] ' example.json 
[
  "data",
  "object",
  "user"
]
(ehsm-py3.9) cbongior@cbongior-mac [09:54:38] [~/dev/ehsm] [main *]
-> % jq 'getpath([               
  "data",
  "object",
  "user"
])' example.json
{
  "id": 1,
  "range": [
    -255,
    0,
    255
  ],
  "notation": "big-O",
  "details": {
    "lat": 0.000,
    "long": 0.000,
    "time": 42
  }
}
(ehsm-py3.9) cbongior@cbongior-mac [09:55:03] [~/dev/ehsm] [main *]
-> % jq '[path(..)] | .[3] | getpath' example.json 
jq: error: getpath/0 is not defined at <top-level>, line 1:
[path(..)] | .[3] | getpath                    
jq: 1 compile error
(ehsm-py3.9) cbongior@cbongior-mac [09:55:25] [~/dev/ehsm] [main *]
-> % jq '[path(..)] | .[3] | getpath(.)' example.json
jq: error (at example.json:27): Cannot index array with string "data"

@wader
Copy link

wader commented May 9, 2024

Something like this? https://jqplay.org/s/lDs_huUTqNi

$ jq -r 'path(.. | scalars) as $p | getpath($p) as $v | "\($p|join("."))=\($v)"' example.json
data.object.user.id=1
data.object.user.range.0=-255
data.object.user.range.1=0
data.object.user.range.2=255
data.object.user.notation=big-O
data.object.user.details.lat=0.000
data.object.user.details.long=0.000
data.object.user.details.time=42
data.object.groups.0.id=2
data.object.groups.0.name=foo
data.object.groups.1.id=3
data.object.groups.1.name=bar
data.metdata.list.0.0.0=1
data.metdata.list.0.0.1=42
data.metdata.list.0.1.0=3.14
data.metdata.list.0.1.1=98.6
data.metdata.list.1.0=3
data.metdata.list.1.1=6
data.metdata.list.1.2=9
data.metdata.list.1.3=low
data.metdata.list.2.0.x=1
data.metdata.list.2.0.y=-1
data.metdata.ugly_nest.depth.test=true
log=123abc

@wader
Copy link

wader commented May 9, 2024

@chb0github what might confuse you is jq's use of implicit input argument.

# here getpath/1 will get ["log"] both as implicit input and as it's first "explicit" input (the path to get)
$ jq '["log"] | getpath(.)' example.json
jq: error (at example.json:27): Cannot index array with string "log"

# by using "... as $something" you will "bind" the value ["log"] to $p, this also make the input passthru so that
# getpath will get json from example.json as input and then $p as the path to get
$ jq '["log"] as $p | getpath($p)' example.json
"123abc"

and in the comment above path(..) as $p will cause each output of path(..) to be bound to $p and the rest of the pipeline is evaluated each time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment