Use the Python client elasticsearch
.
from elasticsearch import Elasticsearch
es_client = Elasticsearch() # local
es_client = Elasticsearch([<cluster_url>]) # remote
Build a body
dictionary for the query.
body = {
"from": 10, # get docs from the number 10
"size": 100, # get 100 docs (default = 10)
"fields": ["f_name1"], # get only wanted fields
"query": { # the query
},
"sort": { # to sort
"time_field": {
"order": "desc"
}
}
}
NOTE: For filtering only some fields, use fields
for fields which are explicitely marked in the mapping, _source
otherwise.
A prototype search on a type in an index is run as
r = es_client.search(index='my_index',
doc_type='my_type',
body=body)
The result r
is a dictionary again, whose keys will depend on the type of query run.
Results will be automatically sorted by relevance. In an aggregation, will be sorted by number of documents.
- number of documents is in
r['hits']['total']
- actual documents are in
r['hits']['hits']
- if fields is used,
r['hits']['hits'][0]['fields']['f_name1'][0]
- for an aggregation
r['aggregations']['agg_name']['buckets']
body = {
"query": {
"term": {
"my_field_name": "chosen_field_value"
}
},
}
If field 'my_field_name' is a dictionary itself, can query for one subfield as 'my_field_name.subfield'.
body = {
"query": {
"range": {
"my_time_field": {
"gte": start_date,
"lt": final_date
}
}
}
}
start_date
and final_date
are datetime/date objects.
a AND b
body = {
"query": {
"bool": {
"must": [
{
"term": {
"field1": "value1"
}
},
{
"term": {
"field2": "value2"
}
}
]
}
}
}
not a AND b
body = {
"query": {
"bool": {
"must_not": [
],
"must": {
"term": {
"field_name1": field_value
}
}
}
},
}
Use size
in the aggregation to make sure the returned sum_other_doc_count
is 0.
On one field:
body = {
"size": 0,
"aggs": {
"my_agg_name": {
"terms": {
"size": 100,
"field": "field_name1"
}
}
}
}
On more fields (double GROUP BY):
body = {
"size": 0,
"aggs": {
"agg_field1": {
"terms": {
"size": 100,
"field1": "value1"
},
"aggs": {
"subagg_field2": {
"terms": {
"size": 100,
"field2": "value2"
}
}
}
}
}
}
The seed string, when changed, will give differently sampled (scored) documents. If no seed is specified, the current time is used as seed.
body = {
fields: ["field1", "field2"],
query: {
function_score : {
query: {
"my_field": "my_value"
},
random_score : {
"seed": "the seed"
}
}
}
}
body = {
"query": {
"simple_query_string": {
"fields": ['field1^3', 'field2'],
"flags": "ALL",
"default_operator": "AND",
"analyzer": "snowball",
"query": "my custom query"
}
}
}
The 3 means field is boosted 3 times.
TODO use docs