ELI5: Elasticsearch
- Built around Lucene for better ease of use.
- Indexing: pages -> words
- Searching: words -> pages
- Communicates through a REST API.
What are some uses of ES?
- Wikipedia: Searches through millions of articles in different languges.
- StackOverflow: Full-text search with geolocation queries.
- GitHub: Search through 130 billions of lines of code.
Good tutorials to get started
- http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html
- http://okfnlabs.org/blog/2013/07/01/elasticsearch-query-tutorial.html
- http://joelabrahamsson.com/elasticsearch-101/
- http://bitquabit.com/post/having-fun-python-and-elasticsearch-part-1/
$ curl 'localhost:9200'
{
"status" : 200,
"name" : "Moon Knight",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.7.2",
"build_hash" : "e43676b1385b8125d647f593f7202acbd816e8ec",
"build_timestamp" : "2015-09-14T09:49:53Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
Elasticsearch supports the basic CRUD operations using HTTP verbs, and supports
the Lucene query parser syntax, e.g. fieldname:value
and wildcards fieldname:value*
.
Creating new entries is known as "indexing", while searching is known as "querying".
Resource structure.
# <index> and <type-of-document> are required.
# <id> is optional and will be generated if one is not provided.
http://localhost:9200/<index>/<type-of-document>/[<id>]
# Examples.
http://localhost:9200/people/tweets/1
http://localhost:9200/articles/header/1
Indexing is also known as the "Create" in CRUD. PUT requests are used for their simplicity to both create and update indices.
# Create a 'tweets' document in the 'people' index with an id of '1'.
curl -XPUT 'localhost:9200/people/tweets/1' -d '{
"user": "mycat",
"message": "Hello world"
}'
# Returns:
{"_index":"people","_type":"tweets","_id":"1","_version":1,"created":true}
# Running the same command again will update the index.
curl -XPUT 'localhost:9200/people/tweets/1' -d '{
"user": "mycat",
"message": "Gutentag"
}'
# Returns:
{"_index":"people","_type":"tweets","_id":"1","_version":2,"created":false}
Querying is also known as reverse-index searching. Every field in a document is index and efficiently stored, so searching is blazingly quick when compared to technologies, i.e MySQL full-text search. You can search by either specifying query parameters or by passing JSON in the request body using the Query String DSL
# Search across all indices and all types.
http://localhost:9200/_search
# Search across all types in the 'articles' index.
http://localhost:9200/articles/_search
# Search explicitly in the 'article' type within the 'articles' index.
http://localhost:9200/articles/header/_search
Method 1. Search with query strings
# Searches for the most relevant articles with 'awesome' in the text.
curl 'localhost:9200/articles/_search?q=text:awesome'
Method 2. Search using the DSL syntax.
# Same as above, but using Lucene query syntax.
curl 'localhost:9200/articles/_search' -d '{
"query": {
"query_string": {
"query": "text:awesome"
}
}
}'
# Simple query term.
curl -XGET localhost:9200/articles/_search -d '{
"query" : {
"term" : { "text": "awesome" }
}
}'
Filters are a magnitude of order more efficient than Queries since
they skip the scoring process and are automatically cached. You can set
the _cache
element on a filter to explicitly control caching. Learn more
about the Filter DSL
In general, use filters for:
- Binary yes/no searches.
- Queries on exactly values.
Useful endpoints.
# '_search': Searches through indices, document types, or documents.
curl 'localhost:9200/_search'
curl 'localhost:9200/articles/_search'
curl 'localhost:9200/articles/header/_search'
# '_mapping': Displays the schema mapping of an index or document.
curl 'localhost:9200/articles/_mapping'
# '_stats': Displays statistics and usage report.
curl 'localhost:9200/_stats'
Useful query string parameters.
# 'q': Single search query.
curl 'localhost:9200/articles/_search?q=text:awesome'
# 'q': Multiple search queries.
curl 'localhost:9200/articles/_search?q=text:mayor+language:es'
# 'fields': Limits the fields returned
curl 'localhost:9200/articles/header/1?fields=slug,language'
# 'size': Specifies the number of hits returned (default: 10)
curl 'localhost:9200/articles/_search?size=1'
# 'from': Offset for result (default: 0)
curl 'localhost:9200/articles/_search?from=5'
# 'pretty': Returns formatted JSON
curl 'localhost:9200/articles/_mapping?pretty=true'
# 'sort': Sorts based on your fields.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html
# 'aggregations': Aggregates data.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html