- Delete all documents from index
- Health / Stats
- Get all
- Search for string across all fields
- Count documents
- Get mapping
- Get one record
- Create one record
- Sort output, search for not empty values, in a field with a name that contains a space, and use jq to extract values.
- Disable automatic date detecting
- Search by several fields
- Get All (actually retrieve all pages of results)
- Get one field multiple records
- Increase number of allowed fields aka. columns
- List indexes
- Info about an index
- Count docs in an index
- Search with query stored in file
- Count number of fields on index
- Count for each unique values in one field find how many unique values there are in another field
- Get names of fields
- Process names of fields in loop
- Get count number of not-null values for every field in index
- Count number of distinct values
- Get list of unique values in a field and count how many occurrences for each distinct/unique value
- Update a field for all records that match a query
- Count the unique values across two fields
- Test an analyzer
- Sorted Euclidean distance
- Euclidean distance
- Explain Query
- Get one document for each unique value in a field
- Python test ElasticSearch connection
- Python query
- String query String
- Setup a multi-node cluster
- ElasticSearch-Dump
- Security
- SearchGuard
- ReactiveSearch Simple Custom Security Proxy
- Scripts
- Nodes
- Query Performance Improvement Ideas
curl -X POST "localhost:9200/index_name/_delete_by_query" -H 'Content-Type: application/json' -d'
"query": {
"match_all": {}
curl -XGET "http://localhost:9200/_cluster/stats?human&pretty"
curl -XGET "http://localhost:9200/_cat/shards?v"
curl -XGET "http://localhost:9200/_cat/indices?v"
curl -XGET "http://localhost:9200/_cat/allocation?v"
curl -H 'Content-Type: application/json' -XGET 'localhost:9200/index_name/_search' -d '
"query" : {
"match_all" : {}
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/index_name/_search' -d '
"query" : {
"query_string": { "query": "heart" }
}' | jq . | head -n25
curl -H 'Content-Type: application/json' -XGET 'localhost:9200/index_name/_count'
curl -H 'Content-Type: application/json' -XGET 'localhost:9200/index_name/_mapping'
curl -H 'Content-Type: application/json' -XGET "localhost:9200/index_name/index_name/MnjvwGkBD86Op0uG1ix5" | jq
curl -X PUT "localhost:9200/index_name/index_name/1/_create" -H 'Content-Type: application/json' -d'
"create": "2015/09/02"
Sort output, search for not empty values, in a field with a name that contains a space, and use jq to extract values.
curl -H 'Content-Type: application/json' -XGET 'localhost:9200/index_name/_search' -d '{ "sort":[{"Scheduled Date.keyword" : {"order":"asc"}}], "query" : {"query_string" : {"query": "Scheduled\\ Date:/.*/"}}}' | jq --raw-output '.hits.hits[]._source."Scheduled Date"'
curl -s -H 'Content-Type: application/json' -X PUT 'localhost:9200/index_name/_mapping'
"mappings": {
+ "date_detection": false,
"properties": {
"orig": {
"type": "text",
curl -H 'Content-Type: application/json' -XGET 'localhost:9200/index_name/_search' -d '
"query": {
"bool": {
"must": [
"term": {
"AccessionNumber.keyword": "123456789"
"term": {
"SeriesNumber.keyword": "1"
"term": {
"SeriesDescription.raw": "FMRI-AX"
"term": {
"InstanceNumber.keyword": "71"
elasticdump --input=https://elasticindex_names.ccm.sickkids.ca --input-index=index_name --output=$ --searchBody='{"_source": ["Report"], "query" : {"match_all" : {} } }' | jq '{id:._id,Report:._source.Report}'
"id": "CnZVSWsBHWg-PhjNBSxI",
"Report": "Flexion/extension viewso…"
curl -s -H 'Content-Type: application/json' -XGET '' -d '
"_source": "PatientID",
"from": 1,
"size": 5
' | jq ".hits.hits[]._source.PatientID"
Note: "from" is index position "size" is number of records
curl -H 'Content-Type: application/json' -XPUT 'localhost:9200/index_name/_settings' -d '
"index.mapping.total_fields.limit": 100000
curl http://localhost:9200/_aliases?pretty=true
curl -s -X GET http://localhost:9200/index_name | jq
curl -s -X GET http://localhost:9200/index_name/index_name/_count | jq
curl -v -H 'Content-Type: application/x-ndjson' -H 'Accept: application/json' -XPOST 'https://elasticindex_names.ccm.sickkids.ca/index_name/_msearch' --data-binary @data.json
# required file data.json (must have new line at end of file)
ubuntu@index_names:~$ curl -s -XGET localhost:9200/index_name/_mapping?pretty | grep type | grep text | wc -l
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/index_name/_search' -d '{
"aggs": {
"count_index_names_by_modality": {
"terms": {
"field": "Modality.raw",
"size": 20,
"order": {
"_count": "desc"
"aggs": {
"exam_count_per_modality": {
"cardinality": {
"field": "AccessionNumber.keyword"
}' | jq
curl -s -XGET localhost:9200/index_name/_mapping | jq .index_name.mappings.index_name.properties | jq 'keys'
for i in $(curl -s -XGET localhost:9200/index_name/_mapping | jq .index_name.mappings.index_name.properties | jq 'keys' | jq .[])
echo "key: $i"
for FIELD_NAME in $(curl -s -XGET localhost:9200/index_name/_mapping | jq .index_name.mappings.index_name.properties | jq 'keys' | jq .[]); do NUM_NOT_NULL=$(curl -s -H 'Content-Type: application/json' -XGET '' -d '
"query" : {
"constant_score" : {
"filter" : {
"exists" : {
"field" : '"$FIELD_NAME"'
}' | jq .hits.total); echo "$FIELD_NAME: $NUM_NOT_NULL"; done | tee out.json
curl -H 'Content-Type: application/json' -XGET '' -d '
"size" : 0,
"aggs" : {
"distinct_orig" : {
"cardinality" : {
"field" : "orig.keyword"
}' | jq
Note: size: 0 here means "Perform Elasticsearch aggregation without returning hits values of documents"
curl -H 'Content-Type: application/json' -XGET '' -d '
"size": 0,
"aggs" : {
"count_orig" : {
"terms" : { "field" : "ProtocolName.keyword", "size": 2147483647}
}' | jq
es.update_by_query(index='index_name', doc_type='index_name', body={
'query': {'term': {'AccessionNumber.keyword': 'FUJI95714'}},
'script': {"inline": "ctx._source.A_new_attribute = 'NEWVALUE'"}}
Note: ".keyword" is important to guarantee an exact match, otherwise values are broken by the analyzer into term subsets, More info
data.aggs = {
"script": "doc['AccessionNumber.raw'].value + ' ' + doc['SeriesNumber.raw'].value"
curl -H 'Content-Type: application/json' -XGET '' -d '
"analyzer": "standard",
"text": "3-plane"
}' | jq .
resulting tokens: [3, plane]
curl -X GET "localhost:9200/index_name/_search" -H 'Content-Type: application/json' -d'
"sort": [
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "return Math.sqrt(Math.pow(Integer.parseInt(doc[\u0027Rows.keyword\u0027].value) - 499, 2) + Math.pow(Integer.parseInt(doc[\u0027Columns.keyword\u0027].value) - 499, 2))"
"order": "asc"
"size": 8,
"_source": ["Rows", "Columns",'"dicom]
' | jq .
curl -X PUT "localhost:9200/my_index/_doc/1" -H 'Content-Type: application/json' -d'
"x1": 3.0,
"y1": 3.0,
"x2": 0.0,
"y2": 0.0
curl -X GET "localhost:9200/my_index/_search" -H 'Content-Type: application/json' -d'
"script_fields": {
"my_doubled_field": {
"script": {
"lang": "painless",
"return Math.sqrt(Math.pow(doc[\u0027x1\u0027].value - doc[\u0027x2\u0027].value, 2) + Math.pow(doc[\u0027y1\u0027].value - doc[\u0027y2\u0027].value, 2))"
Note: \u0027
means ' and is used as a quote inside of a quote
curl -v -H 'Content-Type: application/json' -X GET 'http://localhost:9200/index_name/index_name/DczUL2wBssoKtfgQuNfg/_explain/' -d '
"query" : {
"query_string" : {"query":"dcm"}
}' | jq
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/index_name/_search' -d '{
"query" : {
"query_string": { "query": "heart" }
"collapse" : {
"field" : "AccessionNumber.raw"
}' | jq '.hits.hits[]._source.AccessionNumber'
Note: The collapsing is done by selecting only the top sorted document per collapse key. For instance the query below retrieves the best tweet for each user and sorts them by number of likes.
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': ELASTIC_IP, 'port': ELASTIC_PORT}])
# Lookup dicom by path
result = es.search(
body={'query': {'term': {'filepath_orig.keyword': filepath_orig}}}
if result['hits']['total'] == 0:
from elasticsearch import Elasticsearch
INDEX_NAME = 'index_name'
ELASTIC_IP = 'localhost'
es = Elasticsearch([{'host': ELASTIC_IP, 'port': ELASTIC_PORT}])
query = {
"query" : {
"term" : { "filepath.keyword" : "/hpf/projects/file.txt" }
query = {"query": {"match_all": {}}}
res = es.search(index=INDEX_NAME, body=query)
from elasticsearch import Elasticsearch
INDEX_NAME = 'index_name'
es = Elasticsearch([{'host': ELASTIC_IP, 'port': ELASTIC_PORT}])
query = {
"query" : {
"query_string": { "query": "heart" }
res = es.search(index=INDEX_NAME, body=query)
https://www.elastic.co/guide/en/elasticsearch/guide/master/distributed-cluster.html https://dzone.com/articles/elasticsearch-tutorial-creating-an-elasticsearch-c
cluster.name: "docker-cluster"
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts: ["", "", ""]
docker run -d --name elasticsearch1 -p 9200:9200 -p 9300:9300 -v `pwd`/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml docker.elastic.co/elasticsearch/elasticsearch:6.7.1
docker run -d --name elasticsearch2 -p 9201:9200 -p 9301:9300 -v `pwd`/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml docker.elastic.co/elasticsearch/elasticsearch:6.7.1
docker run -d --name elasticsearch3 -p 9202:9200 -p 9302:9300 -v `pwd`/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml docker.elastic.co/elasticsearch/elasticsearch:6.7.1
Install elasticdump using the node package manager:
npm install elasticdump -g
Option 1: Download Metadata Download all metadata associated with the index_names in your current search results. Run this command and wait for all metadata to be downloaded to the file output.json:
elasticdump \
--input=https://elasticindex_names.ccm.sickkids.ca \
--output=output.json \
Option 2: Download File Paths Download the file path locations of all the index_names in your current search results. This also requires the jq tool, download jq here. Run this command and wait for all the file paths to be downloaded to the file output.txt:
elasticdump \
--input=https://elasticindex_names.ccm.sickkids.ca \
--output=$ \
--searchBody='{"query":{"bool":{"must":[{"bool":{"must":[{"range":{"PatientAgeInt":{"gte":0,"lte":30,"boost":2}}}]}}]}}}' \
| jq ._source.dicom_filepath | tee output.txt
It IS possible to terminate SSL and set up (simple) authentication for the open source version of Elasticsearch and/or Kibana completely for free; you just have to reverse proxy it with something like Nginx or Apache. It is however correct that if you'd like a nice user UI and SSL termination directly on the standalone ES instance, you have to pay.
You can use the free plugin readonlyrest to enable HTTP Authentication, SSL and ACL. https://readonlyrest.com/free/
For free ElasticSearch authentication, lookup SearchGuard
It’s also possible to secure your Elasticsearch cluster’s access with a middleware proxy server that is connected to ReactiveSearch. This allows you to set up custom authorization rules, prevent misuse, only pass back non-sensitive data, etc. Here’s an example app where we show this using a Node.JS / Express middleware:
• Proxy Server https://github.com/appbaseio-apps/reactivesearch-proxy-server/blob/master/index.js (can hand implement custom ACLs here)
• Proxy Client https://github.com/appbaseio-apps/reactivesearch-proxy-client/blob/master/src/App.js
The scripting module enables you to use scripts to evaluate custom expressions. For example, you could use a script to return "script fields" as part of a search request or evaluate a custom score for a query.
Nodes are servers. Data can be shaded to split up data and increase performance. Replicas are for ensuring availability encase of an outage and for faster searching in parallel on different replicas. A replica duplicates shards on different nodes (servers).
• By default I elasticsearch will put 5 shards on one server. However, 5 servers each with one shard will be faster mainly because of 5x disk IO
• The rule of thumb is that shards should consists of 20–40 GB of data.
• Store everything in RAM
"I was able to store the entire index to RAM by using the setting below while indexing the data but now the problem is the RAM usage by ElasticSearch is almost three times the size of the index. Lucene will often use up to three times the size of the index due to the merging of existing segments."
store.type" : "memory"
gateway.type: fs
• Reduce amount of data
• Lower number of replicas from 1 (default) to 0. Usually, the setup that has fewer shards per node in total will perform better. The reason for that is that it gives a greater share of the available filesystem cache to each shard, and the filesystem cache is probably Elasticsearch’s number 1 performance factor. At the same time, beware that a setup that does not have replicas is subject to failure in case of a single node failure, so there is a trade-off between throughput and availability.
curl -H 'Content-Type: application/json' -XPUT http://$HOST_IP:$ELASTIC_PORT/$ELASTIC_INDEX/_settings -d '
"index" : {
"number_of_replicas" : 0
• More specific queries to reduce number of fields. A common technique to improve search speed over multiple fields is to copy their values into a single field at index time, and then use this field at search time. This can be automated with the copy-to directive of mappings without having to change the source of documents. Here is an example: https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-search-speed.html
