- It is a highly scalable, open-source, full-text search engine.
- It allows you to store and search data quickly and in near real time.
- It is built on top of Apache Lucene.
- It is schemaless.
- It stores data in the form of JSON documents.
- It has REST Apis for storing and searching data.
- Cluster = Server(s)
- Node = Server
- Index = Database
- Type = Table
- Document = Record (or row)
-
Data Node
- Storing the data and performing operations on data (indexing, searching, aggregation, etc.) -
Master Node
- Maintaining the health of the cluster and performing administrative tasks. (creating/deleting indices, tracking which nodes are part of the cluster) -
Coordinating Node
- Receives requests from client applications and aggregates results from Data/Master Nodes. -
By default a node is a master-eligible node and a data node.
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.0.tar.gz
tar -xvf elasticsearch-5.6.0.tar.gz
cd elasticsearch-5.6.0/bin
./elasticsearch
curl -L -O https://artifacts.elastic.co/downloads/kibana/kibana-5.6.0-darwin-x86_64.tar.gz
tar -xvf kibana-5.6.0-darwin-x86_64.tar.gz
cd kibana-5.6.0-darwin-x86_64/bin
./kibana
cd ~/elasticsearch-5.6.0
bin/elasticsearch
=> (http://localhost:9200)
cd ~/kibana-5.6.0-darwin-x86_64/
bin/kibana
=> (http://localhost:5601)
elasticsearch.yml
jvm.options
Kibana
->Dev Tools
->Console
(called Sense previously)
GET /
GET /_cat/health?v
GET /_cat/nodes?v
GET /_cat/indices?v
PUT library
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0
}
}
PUT /library/books/1
{
"title": "The quick brown fox",
"price": 5,
"colors": ["red", "green", "blue"]
}
_index
_type
_id
_score
_source
index
is the operation here, along with that we specify the document_id
.
POST library/books/_bulk
{ "index": { "_id": 2 } }
{ "title": "The quick brown fox jumps over the lazy dog", "price": 15, "colors": ["blue", "yellow"] }
{ "index": { "_id": 3 } }
{ "title": "The quick brown fox jumps over the lazy dog", "price": 8, "colors": ["red", "blue"] }
{ "index": { "_id": 4 } }
{ "title": "Brown fox brown dog", "price": 2, "colors": ["black", "yellow", "red", "blue"] }
{ "index": { "_id": 5 } }
{ "title": "Lazy dog", "price": 9, "colors": ["red", "blue", "green"] }
GET /library/books/1
- By re-indexing them (requires all attributes to be specified)
POST /library/books/1
{
"title": "The quick fantastic fox",
"price": 5,
"colors": ["red", "green", "blue"]
}
- Or by using the update API (you can specify the attribute(s) to be updated)
POST /library/books/1/_update
{
"doc": {
"title": "The quick brown fox"
}
}
DELETE /library/books/1
- This does not do any scoring so all docs have the same score.
- Get all documents in the books type.
GET library/books/_search
- Get documents having
fox
in theirtitle
field.
GET library/books/_search
{
"query": {
"match": {
"title": "fox"
}
}
}
- The relevance score of each document is represented by a positive floating-point number called the
_score
. - The higher the _score, the more relevant the document.
- A query clause generates a
_score
for each document. - The scoring algorithm used in Elasticsearch is known as TF/IDF (
term frequency/inverse document frequency
)
- How often does the term appear in the field?
- The more often, the more relevant.
- A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.
- How often does each term appear in the index?
- The more often, the less relevant.
- Terms that appear in many documents have a lower weight than more-uncommon terms.
-
How long is the field?
-
The longer it is, the less likely it is that words in the field will be relevant.
-
A term appearing in a short title field carries more weight than the same term appearing in a long content field.
-
In case of multiple clauses, the more clauses that match, the higher the
_score
. -
In case of multiple query clauses, the
_score
from each of these query clauses is combined to calculate the overall_score
for the document.
- Get documents having either
quick
ordog
in theirtitle
field.
GET library/books/_search
{
"query": {
"match": {
"title": "quick dog"
}
}
}
- Get documents having phrase
quick dog
in theirtitle
field.
GET library/books/_search
{
"query": {
"match_phrase": {
"title": "quick dog"
}
}
}
- Let's find all docs with "quick" and "lazy dog".
bool
query allows us to combine multiple queriesmust
clause is similar toAND
in SQL, all conditions inside must match.
GET library/books/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "quick"
}
},
{
"match_phrase": {
"title": "lazy dog"
}
}
]
}
}
}
- Get documents which must not have
quick
andlazy dog
in theirtitle
field.
GET library/books/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"title": "quick"
}
},
{
"match_phrase": {
"title": "lazy dog"
}
}
]
}
}
}
- Combinations can be boosted for different effects.
should
clause is similar toOR
in SQL, either condition inside must match.
GET library/books/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "quick dog"
}
}
},
{
"match_phrase": {
"title": {
"query": "lazy dog",
"score": 3
}
}
}
]
}
}
}
- It tells you what parts of the
title
field matches - You can configure this to use different kinds of emphasis markers.
GET library/books/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "quick dog",
"score": 2
}
}
},
{
"match_phrase": {
"title": {
"query": "lazy dog"
}
}
}
]
}
},
"highlight": {
"fields": {
"title": {}
}
}
}
- Filtering is often faster than querying, because it doesn't has to calculate score.
- Get documents that have
price
more than 5.
GET library/books/_search
{
"query": {
"bool": {
"filter": {
"range": {
"price": {
"gt": 5
}
}
}
}
}
}
- Get documents that have
dog
in thetitle
and theprice
is between 5 & 10.
GET library/books/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "dog"
}
}
],
"filter": {
"range": {
"price": {
"gte": 5,
"lte": 10
}
}
}
}
}
}
-
How does full text search actually works?
-
When documents are indexed, each document undergo an
Analysis
step. -
Analysis is a combination of tokenization and token filters.
-
Analysis
=Tokenization
+Token filters
-
Tokenization = It takes the field and breaks it into multiple parts called
tokens
-
Token Filters = It applies some filters on the tokens, to massage into diffrent format.
GET /library/books/_analyze
{
"tokenizer": "standard",
"text": "Brown fox brown dog"
}
GET /library/books/_analyze
{
"tokenizer": "standard",
"filter": ["lowercase"],
"text": "Brown fox brown dog"
}
GET /library/books/_analyze
{
"tokenizer": "standard",
"filter": ["lowercase", "unique"],
"text": "Brown brown brown fox brown fox dog"
}
Analyzer
= Atokenizer
+ 0 or moretoken filters
- This applies the standard tokenizer and standard lowercase token filter.
GET /library/books/_analyze
{
"analyzer": "standard",
"text": "Brown fox brown dog"
}
Understanding analysis is very important, because it helps your queries to be more relevant, and the emitted tokens define whether a document matches a query or not.
standard
tokenizer did not breakquick.brown_Fox
and removed things like$
,@
GET /library/books/_analyze
{
"tokenizer": "standard",
"filter": ["lowercase"],
"text": "THE quick.brown_Fox Jumped! $19.95 @ 3.0"
}
- Now we split
quick.brown
andbrown_Fox
- but the integers and special chars are ingnored
- because it only tokenizes alphabets.
GET /library/books/_analyze
{
"tokenizer": "letter",
"filter": ["lowercase"],
"text": "THE quick.brown_Fox Jumped! $19.95 @ 3.0"
}
- With
standard
tokenizer - This breaks all the words in the email and the URL
GET /library/books/_analyze
{
"tokenizer": "standard",
"text": "[email protected] website https://www.elastic.co"
}
- With
uax_url_email
tokenizer - This does not breaks the email and the URL
GET /library/books/_analyze
{
"tokenizer": "uax_url_email",
"text": "[email protected] website https://www.elastic.co"
}
- Can be used to explore your data and get statistics on stored data.
GET /library/_search
{
"size": 0,
"aggs": {
"popular-colors": {
"terms": {
"field": "colors.keyword"
}
}
}
}
- Aggregation works on the documents returned by the search query.
GET /library/_search
{
"query": {
"match": {
"title": "dog"
}
},
"aggs": {
"popular-colors": {
"terms": {
"field": "colors.keyword"
}
}
}
}
GET /library/_search
{
"size": 0,
"aggs": {
"price-statistics": {
"terms": {
"field": "colors.keyword"
}
},
"popular-colors": {
"terms": {
"field": "colors.keyword"
},
"aggs": {
"avg-price-per-color": {
"avg": {
"field": "price"
}
}
}
}
}
}
- ES is schemaless, when you index a document, ES will try to infer the type of each field in the document.
famous-librarians
is a new indexlibrarian
is the typetext
field types are analyzed for full-text search
PUT /famous-librarians
{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"my-desc-analyzer": {
"type": "custom",
"tokenizer": "uax_url_email",
"filters": ["lowercase"]
}
}
}
}
},
"mappings": {
"librarian": {
"properties": {
"name": {
"type": "text"
},
"favorite-colors": {
"type": "keyword"
},
"birth-date": {
"type": "date",
"format": "year_month_day"
},
"hometown": {
"type": "geo_point"
},
"description": {
"type": "text",
"analyzer": "my-desc-analyzer"
}
}
}
}
}
GET /famous-librarians/_mapping
PUT /famous-librarians/librarian/1
{
"name": "Sarah Byrd Askew",
"favorite-colors": ["yellow", "light-grey"],
"birth-date": "1877-02-15",
"hometown": {
"lat": "32.349722",
"lon": "-86.641111"
},
"description": "An American public librarian who poineered the establishment of libraries in the United States. https://en.wikipedia.org/wiki/Sarah_Byrd_Askew"
}
PUT /famous-librarians/librarian/2
{
"name": "John J Beckley",
"favorite-colors": ["red", "white"],
"birth-date": "1757-08-07",
"hometown": {
"lat": "51.507222",
"lon": "-0.1275"
},
"description": "An American political campaign manager and the first Librarian of the United States Congress - https://en.wikipedia.org/wiki/John_J._Beckley"
}
POST /famous-librarians/librarian/_search
{
"query": {
"match": {
"name": "john"
}
}
}
POST /famous-librarians/librarian/_search
{
"query": {
"match": {
"description": "https://en.wikipedia.org/wiki/Sarah_Byrd_Askew"
}
}
}
POST /famous-librarians/librarian/_search
{
"query": {
"match": {
"description": "https://en.wikipedia.org/wiki/John_J._Beckley"
}
}
}
- ElasticSearch Documentation - https://www.elastic.co/guide/index.html