Skip to content

Instantly share code, notes, and snippets.

@dcode
Last active November 20, 2023 17:30
Show Gist options
  • Save dcode/ee57340ce94f7ab093770a26e39e4f73 to your computer and use it in GitHub Desktop.
Save dcode/ee57340ce94f7ab093770a26e39e4f73 to your computer and use it in GitHub Desktop.
Analyze Elasticsearch on command line using HTTPie and jq

README

Especially when developing new query logic, it's helpful to query elasticsearch from the command line. If your Elasticsearch cluster uses SAML authentication or some other SSO, it's not simple or sometimes not even possible to query using curl directly. I wrote an auth plugin for HTTPie that should greatly simplify this process if you have rights to create API keys via the Kibana dev console (talk to your administrator and see the link below).

This process is also super handy for shell scripting because you can provide fine-grained limits of what your API key can do, making their use much safer and easier to manage than embedding native realm username/passwords.

Setup

First, install HTTPie and my auth plugin.

pip install httpie httpie-apikey-auth

Create API key

Using dev tools in Kibana, create API key See https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html for details and options

Most basic syntax with your full rights:

POST /_security/api_key
{
  "name": "cli analysis key (dcode)"
}

In your shell, export the following variables. Side note, I like to split out my workspaces by distinct project directories. You can use a tool call direnv to store these environment variables and autoload them when you cd to that directory. This is particularly beneficial for this, because if you lose your API key creds you have to delete them using Kibana and regenerate them.

export SESSION_NAME=my_session
export MY_ID='<your key ID from above>'
export MY_APIKEY='<your apikey from above>'
export ES_HOST='https://es.cloud.sample.com'

Now, let's create an http session with creds. After this, you don't need to specify auth information anymmore. HTTPie will automatically track the auth info in the local session storage.

http --auth-type=apikey --auth="${MY_ID}:${MY_APIKEY}" --session ${SESSION_NAME} ${ES_HOST}

Now you can move on to your CLI querying.

Query examples

From now on, you can just do the following to use your existing session to hit the entire Elasticsearch API.

http --session ${SESSION_NAME} ${ES_HOST}

Elasticsearch URI search

URI Search documentation

Query where agent.hostname is set to rock01 and limit it to 10 results.

http --session ${SESSION_NAME} ${ES_HOST}/*beat*/_search  q=='agent.hostname: rock01' size=10

Elasticsearch Query DSL

We can perform the same search as above using a request body and a Query DSL query by passing it to HTTPie on stdin via the pipe.

cat <<EOF | http --session ${SESSION_NAME} ${ES_HOST}/*beat*/_search
{
    "size": 10,
    "query" : {
        "term" : { "agent.hostname" : "rock01" }
    }
}
EOF

PRO-TIP: You can form your query in Kibana Discover app, and click Inspect to see the request that Kibana sends. Kibana throws a bunch of other aggregations and other items that you probably don't need, but you can at least see how it maps your query from KQL to Elasticsearch Query DSL.

# Query from STDIN. This searches all beats indexes
cat <<EOF | http --session ${SESSION_NAME} ${ES_HOST}/*beat*/_search | jq -c '.hits.hits[]' | tee output.json
{
"version": true,
"size": 500,
"sort": [
{
"@timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"aggs": {
"2": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "30s",
"time_zone": "America/Chicago",
"min_doc_count": 1
}
}
},
"query": {
"bool": {
"must": [],
"filter": [
{
"match_all": {}
},
{
"range": {
"@timestamp": {
"gte": "2020-03-31T16:06:11.900Z",
"lte": "2020-03-31T16:21:11.900Z",
"format": "strict_date_optional_time"
}
}
}
],
"should": [],
"must_not": []
}
}
}
EOF
# split results into files by agent.type
while read -r line; do
echo "${line}" >> "$(echo "${line}" | jq -r '.agent.type').json";
done < <(cat output.json | jq -c -M '._source')
# Split results into files by index type (should be same as above, but statically defined)
cat output.json | \
tee \
>(jq -c 'select(._index | startswith("auditbeat")) | ._source' > auditbeat.ndjson) \
>(jq -c 'select(._index | startswith("filebeat")) | ._source' > filebeat.ndjson) \
>(jq -c 'select(._index | startswith("metricbeat")) | ._source' > metricbeat.ndjson) \
>(jq -c 'select(._index | startswith("journalbeat")) | ._source' > journalbeat.ndjson) \
>(jq -c 'select(._index | startswith("winlogbeat")) | ._source' > winlogbeat.ndjson) \
>(jq -c 'select(._index | startswith("endgame")) | ._source' > endgame.ndjson) \
>/dev/null
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment