- It is a highly scalable, open-source, full-text search engine.
- It allows you to store and search data quickly and in near real time.
- It is built on top of Apache Lucene.
- It is schemaless.
- It stores data in the form of JSON documents.
- It has REST Apis for storing and searching data.
- Cluster = Server(s)
- Node = Server
- Index = Database
- Type = Table
- Document = Record (or row)
-
Data Node- Storing the data and performing operations on data (indexing, searching, aggregation, etc.) -
Master Node- Maintaining the health of the cluster and performing administrative tasks. (creating/deleting indices, tracking which nodes are part of the cluster) -
Coordinating Node- Receives requests from client applications and aggregates results from Data/Master Nodes. -
By default a node is a master-eligible node and a data node.
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.0.tar.gztar -xvf elasticsearch-5.6.0.tar.gzcd elasticsearch-5.6.0/bin./elasticsearch
curl -L -O https://artifacts.elastic.co/downloads/kibana/kibana-5.6.0-darwin-x86_64.tar.gztar -xvf kibana-5.6.0-darwin-x86_64.tar.gzcd kibana-5.6.0-darwin-x86_64/bin./kibana
cd ~/elasticsearch-5.6.0bin/elasticsearch=> (http://localhost:9200)
cd ~/kibana-5.6.0-darwin-x86_64/bin/kibana=> (http://localhost:5601)
elasticsearch.ymljvm.options
Kibana->Dev Tools->Console(called Sense previously)
GET /GET /_cat/health?vGET /_cat/nodes?vGET /_cat/indices?v
PUT library
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0
}
}PUT /library/books/1
{
"title": "The quick brown fox",
"price": 5,
"colors": ["red", "green", "blue"]
}_index_type_id_score_source
indexis the operation here, along with that we specify the document_id.
POST library/books/_bulk
{ "index": { "_id": 2 } }
{ "title": "The quick brown fox jumps over the lazy dog", "price": 15, "colors": ["blue", "yellow"] }
{ "index": { "_id": 3 } }
{ "title": "The quick brown fox jumps over the lazy dog", "price": 8, "colors": ["red", "blue"] }
{ "index": { "_id": 4 } }
{ "title": "Brown fox brown dog", "price": 2, "colors": ["black", "yellow", "red", "blue"] }
{ "index": { "_id": 5 } }
{ "title": "Lazy dog", "price": 9, "colors": ["red", "blue", "green"] }GET /library/books/1
- By re-indexing them (requires all attributes to be specified)
POST /library/books/1
{
"title": "The quick fantastic fox",
"price": 5,
"colors": ["red", "green", "blue"]
}- Or by using the update API (you can specify the attribute(s) to be updated)
POST /library/books/1/_update
{
"doc": {
"title": "The quick brown fox"
}
}DELETE /library/books/1
- This does not do any scoring so all docs have the same score.
- Get all documents in the books type.
GET library/books/_search
- Get documents having
foxin theirtitlefield.
GET library/books/_search
{
"query": {
"match": {
"title": "fox"
}
}
}- The relevance score of each document is represented by a positive floating-point number called the
_score. - The higher the _score, the more relevant the document.
- A query clause generates a
_scorefor each document. - The scoring algorithm used in Elasticsearch is known as TF/IDF (
term frequency/inverse document frequency)
- How often does the term appear in the field?
- The more often, the more relevant.
- A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.
- How often does each term appear in the index?
- The more often, the less relevant.
- Terms that appear in many documents have a lower weight than more-uncommon terms.
-
How long is the field?
-
The longer it is, the less likely it is that words in the field will be relevant.
-
A term appearing in a short title field carries more weight than the same term appearing in a long content field.
-
In case of multiple clauses, the more clauses that match, the higher the
_score. -
In case of multiple query clauses, the
_scorefrom each of these query clauses is combined to calculate the overall_scorefor the document.
- Get documents having either
quickordogin theirtitlefield.
GET library/books/_search
{
"query": {
"match": {
"title": "quick dog"
}
}
}- Get documents having phrase
quick dogin theirtitlefield.
GET library/books/_search
{
"query": {
"match_phrase": {
"title": "quick dog"
}
}
}- Let's find all docs with "quick" and "lazy dog".
boolquery allows us to combine multiple queriesmustclause is similar toANDin SQL, all conditions inside must match.
GET library/books/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "quick"
}
},
{
"match_phrase": {
"title": "lazy dog"
}
}
]
}
}
}- Get documents which must not have
quickandlazy dogin theirtitlefield.
GET library/books/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"title": "quick"
}
},
{
"match_phrase": {
"title": "lazy dog"
}
}
]
}
}
}- Combinations can be boosted for different effects.
shouldclause is similar toORin SQL, either condition inside must match.
GET library/books/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "quick dog"
}
}
},
{
"match_phrase": {
"title": {
"query": "lazy dog",
"score": 3
}
}
}
]
}
}
}- It tells you what parts of the
titlefield matches - You can configure this to use different kinds of emphasis markers.
GET library/books/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "quick dog",
"score": 2
}
}
},
{
"match_phrase": {
"title": {
"query": "lazy dog"
}
}
}
]
}
},
"highlight": {
"fields": {
"title": {}
}
}
}- Filtering is often faster than querying, because it doesn't has to calculate score.
- Get documents that have
pricemore than 5.
GET library/books/_search
{
"query": {
"bool": {
"filter": {
"range": {
"price": {
"gt": 5
}
}
}
}
}
}- Get documents that have
dogin thetitleand thepriceis between 5 & 10.
GET library/books/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "dog"
}
}
],
"filter": {
"range": {
"price": {
"gte": 5,
"lte": 10
}
}
}
}
}
}-
How does full text search actually works?
-
When documents are indexed, each document undergo an
Analysisstep. -
Analysis is a combination of tokenization and token filters.
-
Analysis=Tokenization+Token filters -
Tokenization = It takes the field and breaks it into multiple parts called
tokens -
Token Filters = It applies some filters on the tokens, to massage into diffrent format.
GET /library/books/_analyze
{
"tokenizer": "standard",
"text": "Brown fox brown dog"
}GET /library/books/_analyze
{
"tokenizer": "standard",
"filter": ["lowercase"],
"text": "Brown fox brown dog"
}GET /library/books/_analyze
{
"tokenizer": "standard",
"filter": ["lowercase", "unique"],
"text": "Brown brown brown fox brown fox dog"
}Analyzer= Atokenizer+ 0 or moretoken filters- This applies the standard tokenizer and standard lowercase token filter.
GET /library/books/_analyze
{
"analyzer": "standard",
"text": "Brown fox brown dog"
}Understanding analysis is very important, because it helps your queries to be more relevant, and the emitted tokens define whether a document matches a query or not.
standardtokenizer did not breakquick.brown_Foxand removed things like$,@
GET /library/books/_analyze
{
"tokenizer": "standard",
"filter": ["lowercase"],
"text": "THE quick.brown_Fox Jumped! $19.95 @ 3.0"
}- Now we split
quick.brownandbrown_Fox - but the integers and special chars are ingnored
- because it only tokenizes alphabets.
GET /library/books/_analyze
{
"tokenizer": "letter",
"filter": ["lowercase"],
"text": "THE quick.brown_Fox Jumped! $19.95 @ 3.0"
}- With
standardtokenizer - This breaks all the words in the email and the URL
GET /library/books/_analyze
{
"tokenizer": "standard",
"text": "[email protected] website https://www.elastic.co"
}- With
uax_url_emailtokenizer - This does not breaks the email and the URL
GET /library/books/_analyze
{
"tokenizer": "uax_url_email",
"text": "[email protected] website https://www.elastic.co"
}- Can be used to explore your data and get statistics on stored data.
GET /library/_search
{
"size": 0,
"aggs": {
"popular-colors": {
"terms": {
"field": "colors.keyword"
}
}
}
}- Aggregation works on the documents returned by the search query.
GET /library/_search
{
"query": {
"match": {
"title": "dog"
}
},
"aggs": {
"popular-colors": {
"terms": {
"field": "colors.keyword"
}
}
}
}GET /library/_search
{
"size": 0,
"aggs": {
"price-statistics": {
"terms": {
"field": "colors.keyword"
}
},
"popular-colors": {
"terms": {
"field": "colors.keyword"
},
"aggs": {
"avg-price-per-color": {
"avg": {
"field": "price"
}
}
}
}
}
}- ES is schemaless, when you index a document, ES will try to infer the type of each field in the document.
famous-librariansis a new indexlibrarianis the typetextfield types are analyzed for full-text search
PUT /famous-librarians
{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"my-desc-analyzer": {
"type": "custom",
"tokenizer": "uax_url_email",
"filters": ["lowercase"]
}
}
}
}
},
"mappings": {
"librarian": {
"properties": {
"name": {
"type": "text"
},
"favorite-colors": {
"type": "keyword"
},
"birth-date": {
"type": "date",
"format": "year_month_day"
},
"hometown": {
"type": "geo_point"
},
"description": {
"type": "text",
"analyzer": "my-desc-analyzer"
}
}
}
}
}GET /famous-librarians/_mappingPUT /famous-librarians/librarian/1
{
"name": "Sarah Byrd Askew",
"favorite-colors": ["yellow", "light-grey"],
"birth-date": "1877-02-15",
"hometown": {
"lat": "32.349722",
"lon": "-86.641111"
},
"description": "An American public librarian who poineered the establishment of libraries in the United States. https://en.wikipedia.org/wiki/Sarah_Byrd_Askew"
}PUT /famous-librarians/librarian/2
{
"name": "John J Beckley",
"favorite-colors": ["red", "white"],
"birth-date": "1757-08-07",
"hometown": {
"lat": "51.507222",
"lon": "-0.1275"
},
"description": "An American political campaign manager and the first Librarian of the United States Congress - https://en.wikipedia.org/wiki/John_J._Beckley"
}POST /famous-librarians/librarian/_search
{
"query": {
"match": {
"name": "john"
}
}
}POST /famous-librarians/librarian/_search
{
"query": {
"match": {
"description": "https://en.wikipedia.org/wiki/Sarah_Byrd_Askew"
}
}
}POST /famous-librarians/librarian/_search
{
"query": {
"match": {
"description": "https://en.wikipedia.org/wiki/John_J._Beckley"
}
}
}- ElasticSearch Documentation - https://www.elastic.co/guide/index.html