##TUNING##
System: set file descriptors to 32K or 64K
vim /etc/security/limit.conf
elasticsearch - nofile 65535
elasticsearch - memlock unlimited
use following command to check
curl localhost:9200/_nodes/process?pretty
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 2697,
"max_file_descriptors" : 65535,
"mlockall" : true
}
To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf
sysctl -w vm.max_map_count=262144
#If you installed Elasticsearch using a package (.deb, .rpm) this setting
#will be changed automatically. To verify, run sysctl vm.max_map_count.
Disable swap
vm.swappiness to 0
For SSDs in r3, maybe it's better to mount with discard
option since it supports TRIM:
vim /etc/fstab/
/dev/xvdb /mnt ext4 defaults,noatime,nodiratime,discard 0 0
Use noop scheduler for SSD:
echo noop | sudo tee /sys/block/xvdc/queue/scheduler
vim /etc/default/elasticsearch
use half of machine memory for JVM or not excess 32g
ES_HEAP_SIZE=15g
MAX_OPEN_FILES=65535
MAX_LOCKED_MEMORY=unlimited
vim /etc/elasticsearch/elasticsearch.yaml
never swaping
bootstrap.mlockall: true
indexing performance
"indices.memory.index_buffer_size": "30%", #10%
"index.translog.flush_threshold_ops": 50000, #1000
"index.refresh_interval": "5s", #1s
#"index.store.type": "mmapfs"
adjust thoughput from 20mb to 100mb
PUT /_cluster/settings
{
"persistent" : {
"indices.store.throttle.max_bytes_per_sec" : "100mb"
}
}
-
elasticsearch 會儲存原始檔案在 _source 欄位, 如果不需要可以關閉
-
elasticsearch 會把所有欄位的資料處理好放在 _all 欄位, 如果不需要也可以關閉
{ '_id': 1 'title': 'this is first blog', 'author': 'kakashi', 'content': 'test 123' } 存到ES後會變成 { '_id': 1, '_all': 'this, is, first, blog, kakashi, test, 123', 'title': 'this, is, first, blog', 'author': 'kakashi', 'content': 'test, 123', '_source': { 'title': 'this is first blog', 'author': 'kakashi', 'content': 'test 123' }
-
如果把 _source 關閉, 可以利用 _store 決定是否要儲存此field
{ "tweet" : { "properties" : { "message" : { "type" : "string", "store" : true, "index" : "analyzed", },
-
使用 _source 和 _store 的最大差別, 用 _source 可以利用 update API 去更新值
-
在 analyze field 時, 如果不需要算出score (相關性), 可以把norms關閉, 會節省大量memory
-
index_options 可以決定要不要存term frequencies 還有 positions
-
不需要index的欄位請使用no, 該欄位不需要切詞可以用not_analyzed
-
利用template
PUT _template/blog-template { "template": "db*", <--- index(db) name "mappings": { "blog": { <---- type (table) name "properties": { "author": { "type": "string", "index": "not_analyzed" }, "content": { "type": "string" } } } }
-
取得mapping
GET db/_mapping/
-
直接修改db的mapping
PUT db/_mapping
- 利用Bulk indexing的方式, 最好控制在1MB~5MB間
- 重要性較低的資料可以用bulk UDP indexing (可以忍受掉資料)
- reindexing時可以將refresh_interval設成-1, Bulk indexing時手動做refresh
- 可以利用index warmer增加搜索速度 (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-warmers.html)
- 增加Sharding & 機器 -> 增加indexing能力
- 增加Replica & 機器 -> 增加Read能力
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html https://blog.codecentric.de/en/2014/05/elasticsearch-indexing-performance-cheatsheet/ http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html