Keep in mind, our use case is largely timeseries analytics, but broad themes of issues we encountered:
- Realtime indexing + querying is tough. Required us to throw beefed up dedicated hardware at that problem while we were serving historical queries on nodes w/ a different config (typical hot, warm cold node configuration).
- As always, skewed data sets require special consideration in index and document schema modelling.
- JVM heap, aggregation query and doc mapping optimization needed or you'll easily hit OOM on nodes which can lead to...
- Bad failure scenarios where you get an entire cluster brought to a halt, no queries able to be served. Literally one bad and greedy query can put your node and cluster in a very bad state.
- Depending on your document mapping, disk storage requirements can easily bite you but are made better by https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch
+1 to the ES team though, they do listen to and fix issues quickly. Moving to doc values as the d