(by @_ashish_tiwari)
Version : 6.2
Heap size : 30 GB
core : 24
Memory : 128 GB
Client : PHP - 6.0
Fielddata is disabled on text fields by default.Set fielddata=true on [myfieldName] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.
- I noticed that i was sorting on 'text' field. Just changed field type from 'text' to 'keyword' and reindex the all data. Now sort is working. Field type should
Numeric
,date
,keyword
type only on which we need to perform sort. - Even you cannot perform aggregation on
text
field.
rejected execution of org.elasticsearch.transport.TransportService$7@45468c89 on EsThreadPoolExecutor[name = localhost:9200/bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@452b58db[Running, pool size = 24, active threads = 24, queued tasks = 200, completed tasks = 32109349]]
-
My data was losses because of above Exception. I was monitoring number of rejection by hitting
curl -X GET http://localhost:9200/_cat/thread_pool
-
Elsticsearch bulk pool size got full and start rejecting all incoming data. I increased my
thread_pool.bulk.queue_size
andthread_pool.index.queue_size
with 500 which stopped rejection. It is not standard value, You need to find out what should be perfect value for your application. -
Also set
thread_pool.index.size
andthread_pool.bulk.size
to 24. 24 is number of core cpus you have. It will make sure all cpu should be in use. -
If it is again start rejection after increase bulk queue size with serval limit, Check with your bulk request frequency and make bulk request with some interval.
Not not_x_content_exception
Check with your content whether it is already json encoded or not. It should be in proper json format.
version_conflict_engine_exception , version conflict, current version [versionid] is different than the one provided [versionid]
- use
retry_on_conflict : 5
parameter, It reattempt to reindex/update your doc for 5 times. 5 is not standard value. You need to evaluate for your applicaton. - If you care about data loss, Then you need to again reindex/update your data from your primary DB OR store your data locally and retry again after some time.
- If data loss is okay with you, Then you can avoid retry option. As elasticsearch maintains version of doc while updating, So only your last version data will loss not whole updation.
{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN\/12\/index read-only \/ allow delete (api)];"}
- Your disk space going to full OR its reached to threshold, Which you specify in elasticsearcy.yml file. There is some default value for disk usage. You can check here.
- Once your threshold reached, All indexes will have only read/Delete permission. You can revert this permission with below api:
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"blocks": {
"read_only_allow_delete": "false"
}
}
}'
- You can edit threshold value in elasticsearch.yml file:
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 10gb
cluster.routing.allocation.disk.watermark.high: 10gb
cluster.routing.allocation.disk.watermark.flood_stage: 10gb
My application having heavy update process, For which i was using bulk request with heavy update queries. It also contains lots of script conditions . Flow is first doc is insert to ES and then updates multiple time. Insertion was too fast but update was opposite,which is too slow. Below are some solutions which helped me to increase the update performance.
- Use 'upsert' if possible, where you can update doc if exists or insert as new doc. I converted all my update with upsert which was perfect with my application.
- Don't update one doc multiple time too frequently. Update is nothing but delete and reindex.
- Avoid heavy use of script while updating doc . If it is necessary, then you can store your script in cache of elasticsearch cluster and use it by just passing parameters. Check here For more info .
- Set
index.refresh_interval
to 30s . By changing refresh_interval document will available for search after 30s. Refreshing is an expensive operation and it's default value is 1s . So its better to refresh on specific interval. For more info on refresh_interval check here . - If your performace limited to resource, You can add another data node.
- Most important: I had 1 replica per index. At the time of peak hour, Update and insert was too slow. I just simply remove my replica and set it to 0. You can set no. of replicat as shown here. My update process boost up with 15x. Now there is no more delay.
Document not getting upserted properly with PHP-Elasticsearch SDK.
I specified empty object with empty array like this content => array()
. But this is not valid in Elasticsearch DSL. You can use content => \stdClass()
to specify empty object.
Need to export all data in CSV.
Written library which will use PHP-SDK of elasticsearch and it fetch data using Scroll API to csv. https://github.com/ashishtiwari1993/elasticsearch-csv-export
What should be the value of max_gram and min_gram in Elasticsearch ?
I shared my learnings in this blog. What should be the value of max_gram and min_gram in Elasticsearch ?