How to query in elastic-search

Terminologies

We will be using following information throughout this article:

index_name : customers
index_type : personal
customer will have name,age,gender,email,phone,address,city,state as fields in schema for now

INFO Queries

List all the nodes present in the cluster curl -XGET 'http://localhost:9200/_cat/nodes?v&pretty'
List all the indices present in the node curl -XGET 'http://localhost:9200/_cat/indices?v&pretty'
Check health of your elastic-search node curl -XGET 'http://localhost:9200/_cat/health?v&pretty'

CRUD Queries

In this section, you'll be covering how to create and delete indexes, and create, read, udpate or delete the documents to/from the indexes. Each query, will have it's syntax and an example to try it on the scene.

Create a new index
- Syntax curl -XPUT 'http://localhost:9200/<index_name>?&pretty'
- Example curl -XPUT 'http://localhost:9200/customers?&pretty'
Delete an index
- Syntax curl -XDELETE 'http://localhost:9200/<indices_name>?pretty'
- Example curl -XDELETE 'http://localhost:9200/customers?pretty'
Create a new document
- Syntax curl -XPUT 'http://localhost:9200/<indices_name>/<_type>/<doc_uniq_id>?pretty' -d '{key1:val1,key2:val2,key3:val3}'
- Example curl -XPUT 'http://localhost:9200/customers/personal/1?pretty' -d '{'name':'Amulya','age':25,'gender':'male','email':'[email protected]','phone':'9559004779','address':'Kurla Mumbai Maharashtra','city':'Mumbai','state':'Maharashtra'}'
Retrieve a whole document
- Syntax curl -XGET 'http://localhost:9200/<index_name>/<_type>/<doc_uniq_id>?pretty'
- Example curl -XGET 'http://localhost:9200/customers/personal/1?pretty'
Retrieval partial document or with fewer fields
- Syntax curl -XGET 'http://localhost:9200/<index_name>/<_type>/<_id>?pretty&_source=field1,field2,field3'
- Example curl -XGET 'http://localhost:9200/customers/personal/1?pretty&_source=name,age,gender'
Update a whole document
- Syntax curl -XPUT 'http://localhost:9200/<indices_name>/<_type>/<doc_uniq_id>?pretty' -d '{key1: val2,key2: val3,key3: val4}'
- Example curl -XPUT 'http://localhost:9200/customers/personal/1?pretty' -d '{'name':'Amulya','age':27,'gender':'male','email':'[email protected]','phone':'9559974779','address':'Andheri Mumbai Maharashtra','city':'Mumbai','state':'Maharashtra'}'
Update a document partially | only specific fields
- Syntax curl -XPOST 'http://localhost:9200/<indices_name>/<_type>/<doc_uniq_id>/_update?pretty' -d '{'doc':{new_key: new_val}}'
- Example curl -XPOST 'http://localhost:9200/customers/personal/1/_update?pretty' -d '{'doc':{'age': '27'}}'
Delete a document
- Syntax curl -XDELETE 'http://localhost:9200/<indices_name>/<_type>/<doc_uniq_id>?pretty
- Example curl -XDELETE 'http://localhost:9200/customers/personal/1?pretty'

SCRIPT

You can also, perform mathematical operations in the update query using SCRIPT clause

Syntax curl -XPOST 'http://localhost:9200/<indices_name>/<_type>/<doc_uniq_id>/_update?pretty' -d '{'script': 'ctx._source.<field_name> <mathematical_operator> <value>'}'
Example curl -XPOST 'http://localhost:9200/customers/personal/1/_udpate?pretty' -d '{'script':'ctx._source.age *= 2'}'

BULK

Bulk operation allows you to perform multiple operations in elastic-search in one go

_MGET | Fetch from multiple indexes and type in bulk
- Multiple Index Bulk fetch | You can fetch from multiple indexes in single fetch _mget operation
  - Syntax curl -XGET 'http://localhost:9200/_mget?pretty' -d '{"docs":[{"_index": <value>,"_type": <value>,"_id": <value>},{"_index": <value>,"_type": <value>,"_id": <value>}]}'
- Specific Index Mutiple Type Bulk Fetch | You can fetch from single index where all types present in that index in single fetch _mget operation
  - Syntax curl -XGET 'http://localhost:9200/<index_name>/_mget?pretty' -d '{"docs": [{"_type": <value>,"_id": <value>},{"_type": <value>,"_id": <value>}]}'
- Single Index Specific Type Bulk Fetch | You can fetch from Specific index where single type is present in single fetch _mget operation
  - Syntax curl -XGET 'http://localhost:9200/<index_name>/_mget?pretty' -d '{"docs": [{"_id": <value>},{"_id": <value>}]}'

_BULK | Perform multiple operations in single request

Multi-Operation Query | Here you can execute heterogeneous operations in query

Syntax

curl -XPOST 'http://localhost:9200/<_index>/<_type>/_bulk?pretty' -H 'Content-Type: application/json' -d '
 {
   {"index": {<doc_id>: <value>}}
   {<key1>:<value1>, <key2>:<value2>, <key3>:<value3>}
   {"delete": {<doc_id>: <value>}}
   {"create": {<doc_id>: <value>}}
   {<key1>:<value1>, <key2>:<value2>, <key3>:<value3>}
   {"update": {<doc_id>: <value>}}
   {<key1>:<value1>, <key2>:<value2>, <key3>:<value3>}
 }'

Read bulk data From JSON file | We will be covering this in future article.

SEARCHING COMPONENTS

When we talk about querying the elastic search fetch the records we need to know few things beforehand. There are many clauses in the elastic search which are used in different combination to get the desired results. I'm listing down the clauses:

QUERY - It works on the concept of relevant scoring and returns the documents with high scores. It takes some time because it assigns score to indivdual document based on their search algo. Higher the score, more relevant the result.
Filters - Filters returns boolean whether docs should be included in the results or not. Filters are faster than query because it just checks whether documents matches at all and not whether it matches well. Data is well structured and can perform more checks like range queries, exact matches, etc

Score calculation mentioned above is related to TF, IDF, FNL. We will cover these things in different chapter. To just give you guys overview about above terms: TF - Term Frequency - How often does the term appear in the field ? - More often, more relevant Example: 1) Amulya is a great person 2) Amulya is a great and really great and super great person - Output: - TF for Statement (2) will be higher IDF - Inverse Document Frequency - How often does the term appear in the index ? - More often, less relevant FLN - Field Length Norm - How long is the field which was searched ? - Longer fields, less relevant

SIMPLE SEARCH QUERY

A very simple search query in beginning to see if some documents are returned.

Explanation : In below example, we are searching wymoing across all the fields in customer documents.
- Syntax curl -XGET "localhost:9200/<_index>/_search?q=<keyword>&pretty"
- Example curl -XGET "localhost:9200/customers/_search?q=wyoming&pretty"
Explanation : In below example, we are searching wymoing in state field presnt in customer documents.
- Syntax curl -XGET "localhost:9200/<_index>/_search?q=<field>:<keyword>&pretty"
- Example curl -XGET "localhost:9200/customers/_search?q=state:wyoming&pretty"

SORTING

Well, you can sort you search results in increasing or decreasing order.

Explanation : In the below example, we're querying wyoming across all the fields and sorting the result by age of the customers in descending order.
- Syntax curl -XGET "localhost:9200/<_index>/_search?q=<keyword>&sort=<field>:<order>&pretty"
- Example curl -XGET "localhost:9200/customers/_search?q=wyoming&sort=age:desc&pretty"

SKIP/FROM and LIMIT/SIZE

These keywords help you to limit your result count with skipping the old ones in every new request. We use from to range our result to start from the given number and size is used to limit our result.

Explanation : In the below example, we are searching for the wyoming across all the fields which have kentucky as record value, while skipping the first 0-4 results and returning 20 customers data only.
Syntax curl -XGET "localhost:9200/<_index>/_search?q=<keyword>&from=<number>&size=<number>&pretty"
Example curl -XGET "localhost:9200/customers/_search?q=wyoming&from=5&size=20&pretty"

EXPLAIN

If you want to see how elastic search computes a score explanation for a query and a specific document. This can give useful feedback whether a document matches or didn’t match a specific query.

Explanation : In the below example, we are getting the explanation of the operation in which we are searching kentucky as value of state fields across all the customers record. It will show us the many things which includes relevance score calculation, memory used in search, time consumed, etc.
Syntax curl -XGET "localhost:9200/<_index>/_explain?q=<field>:<keyword>&pretty"
Example curl -XGET "localhost:9200/customers/_explain?q=state:kentucky&pretty"

QUERY

Query context has been already set below in this article, we're putting syntax and example here to more clarify it's usage practically.

Explanation : In the below example, we're querying everything from elastic-search, sorting the result by age of the customers and limit the result count to 20.
- _source is used to include only the mentioned fields in the results document.
- query is used match the document agains the specified condition.
- match_all is the simplest clause to match everything present that index.
- sort clause sorts for document in specified order against a field.

Syntax

  curl -H 'Content-Type: application/json' -XGET "localhost:9200/<index_name>/_search?pretty" -d '
  {
    "query": {"match_all":{}},
    "sort":{<field_name>: {"order": <order>}},
    "size": <number>,
    "_source": ["field1","field2","field3"]
  }'

Example

  curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
  {
    "query": {"match_all":{}},
    "sort":{age: {"order": "desc"}},
    "size": 20,
    "_source": ["name","age","gender"],
  }'

TERM QUERY

Term query is used for matching the exact keyword. We should avoid using it against the text datatype field.

Explanation : In the below example, we searching for the keyword amulya. This will search document which contains word amulya as individual / key word.
Example

curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
{
  "_source": ["name","age","gender"],
  "query": {"term":{"name":"amulya"}}
}'

REGEX QUERY

Regex query is used for pattern matching against every field in the document if any specific field not specified.

Explanation : In the below example, we searching for the document which do not contains any special character . It also includes and excludes fields matchig the regex given respectively.
Example

curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
{
  "_source": {
    "includes": ["n*"],
    "excludes": ["a*"]
  },
  "query": {
   "regexp" : {
      "name" : "/[0-9A-Za-Z]/"
    }
  }
}'

WILDCARD QUERY

Regex query is used for pattern matching against every field in the document if any specific field not specified.

Explanation : In the below example, we searching for the document which contain name starting from amulya. It also includes and excludes fields matchig the regex given respectively.
Example

curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
{
  "_source": {
    "includes": ["n*"],
    "excludes": ["a*"]
  },
  "query": {
    "wildcard" : {
      "name" : "amulya*"
    }
  }
}'

FUZZY QUERY

Regex query is used for pattern matching against every field in the document if any specific field not specified.

Explanation : In the below example, we searching for the document which contains beautiful in name field. Fuzziness can be [0, 1, 2] or AUTO as per the requirements.
Example

curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
{
  "_source": {
    "includes": ["n*"],
    "excludes": ["a*"]
  },
  "query": {
    "match" : {
      "name" : "beutifell",
      "fuzziness": "AUTO"
    }
  }
}'

RANGE QUERY

Range query helps us to perform range searches like documents between two date ranges.

Example : list of customers who are aged between 10 - 50 age group.

Explanation: In the below exmaple, we are trying to get all customers who are aged between 20 - 60 years.

  curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
  {
    "query": {
      "bool":{
        "must":{"match_all":{}},
        "filter":{
          "range": {
            "age": {
              "gte": 20,
              "lte": 60
            }
          }
        }
      }
    }
  }'

FULL TEXT SEARCH

Full text search is a more advanced way to search a database. Full text search quickly finds all instances of a term (word) in a table without having to scan rows and without having to know which column a term is stored in. Full text search works by using text indexes. In elasticsearch, we have clauses, match, match_phrase, match_phrase_prefix and multi_match. We'll be covering each of the clauses with explanantion and example. we have skipped few clauses that will be covered in the advanced elastic search article.

match - standard full text query

Explanation : In below example, we performing full text search on text fields.

curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
{
  "query": {
    "match":{
      "name":{
        "query": "amulya kashyap",
        "operator": "or"
      }
    }
  }
}'

match_phrase - for matching exact phrases

Explanation : In below example, it will search for exact phrase Amulya Kashyap against the name field in the documents.

curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
{
  "query": {
    "match_phrase":{
      "name": "Amulya Kashyap"
    }
  }
}'

match_phrase_prefix - poor man’s autocomplete

Explanation : In below example, it will search for customers name starting with amu in the documents.

curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
{
  "query": {
    "match_phrase_prefix":{
      "name": "amu"
    }
  }
}'

multi_match - it allows you to search same string in multiple fields.
- Multi Match can have type
  - best_fields
  - most_fields
  - cross_fields
  - phrase
  - phrase_prefix
Explanation: In below example, we are searching amulya against multiple fields which will result into more accurate results. In backgroud, match clause is executed for every single field specified.

All the multi_match type will be covered in the advance elastic search article. we can skip this for now.
```
curl -H 'Content-Type: application/json' -XGET 'localhost:9200/customers/_search?pretty' -d '
{
  "query": {
      "multi_match" : {
        "query":      "amulya",
        "fields":     [ "name", "state", "email", "city" ]
      }
  }
}'
```

BOOLEAN SEARCH

A query which matches the documents based on other conditions/criteria given. This query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document, and is built using one or more query clauses with a typed occurrence. This search has multiple occurrences: - must - this clause specifies that keyword must appear in matching document. - must_not - this clause specifies that keyword must not appear in matching document. - filter - this clause must appear in matching documents. However unlike must the score of the query will be ignored. - should - this clause specifies that keyword may be present in matching document or may not sometimes.

```
curl -H 'Content-Type: application/json' -XGET "localhost:9200/customers/_search?pretty" -d '
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "name" : "amulya" }
      },
      "filter": {
        "term" : { "state" : "mumbai" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 40 }
        }
      },
      "should" : [
        { "match" : { "email" : "am*" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}'
```

AdnaneX/elastic_search_query.md

Terminologies

INFO Queries

CRUD Queries

SCRIPT

BULK

SEARCHING COMPONENTS

SIMPLE SEARCH QUERY

SORTING

SKIP/FROM and LIMIT/SIZE

EXPLAIN

QUERY

TERM QUERY

REGEX QUERY

WILDCARD QUERY

FUZZY QUERY

RANGE QUERY

FULL TEXT SEARCH

BOOLEAN SEARCH