Skip to content

Instantly share code, notes, and snippets.

@turtlemonvh
Last active December 29, 2016 00:50
Show Gist options
  • Save turtlemonvh/ec307a1a64b30a7cb442e332438aaffe to your computer and use it in GitHub Desktop.
Save turtlemonvh/ec307a1a64b30a7cb442e332438aaffe to your computer and use it in GitHub Desktop.
Parsing deeply pipelined aggregations

Parsing deeply nested es queries into csvs with jq

If you already have data in ES, you can get a lot of data from queries but sometimes the complex return structure is annoying to deal with.

This example shows how to turn a complex query result into a csv.

  • example_mapping_template.json shows the structure of the data stored in elasticsearch
  • parsing_example.sh shows how to make a complex query against this data and parse the result into a csv
  • example_output.csv shows sample csv output
{
"template_messages": {
"template": "*",
"order": 0,
"settings": {
"index.mapping.ignore_malformed": 1
},
"mappings": {
"Timestamp": {
"type": "date"
},
"user_id": {
"type": "string",
"index": "not_analyzed"
},
"product_id": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
2016-10-07T00:00:00.000Z 55d1df6cabfc4a8a0f64f8e1 hat 2
2016-10-07T00:00:00.000Z 55d1df6cabfc4a8a0f64f8e1 scarf 1
2016-10-07T00:00:00.000Z 55d1df6cabfc4a8a0f64f8e1 suitcase 1
2016-10-12T00:00:00.000Z 55d1e040abfc4a8a0f64f9b9 hat 16
2016-10-13T00:00:00.000Z 55d1df6cabfc4a8a0f64f8e1 hat 2
2016-10-14T00:00:00.000Z 55d1e040abfc4a8a0f64f9b9 hat 48
2016-10-14T00:00:00.000Z 55d1e040abfc4a8a0f64f9b9 scarf 6
# Example triple nested aggregation
# Grabs count of products per user per time period
cat > requestbody.json << EOF
{
"aggs": {
"timestamp": {
"aggs": {
"user_id": {
"aggs": {
"metric": {
"terms": {
"field": "product_id"
}
}
},
"terms": {
"field": "user_id"
}
}
},
"date_histogram": {
"field": "Timestamp",
"interval": "12h"
}
}
},
"query": {
"bool": {
"must": [
{
"range": {
"Timestamp": {
"gte": "2016-06-01",
"lte": "2016-12-01"
}
}
}
]
}
},
"size": 0
}
EOF
# Save output
curl -X POST "user:pass@localhost:9200/*/_search" --data "@requestbody.json" > res.json
# Show the output, in all its complexity
cat res.json | jq .
# Parse into csv
cat res.json | jq '.aggregations.timestamp.buckets[] | .key_as_string as $time_period | .user_id.buckets[] | .key as $user_id | .metric.buckets[] | [$time_period, $user_id, .key, .doc_count] | @csv ' -r
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment