Elasticsearch errors

Doing a query with a has_parent filter when a parent-child relation references a mapping that doesn't exist returns a NullPointerException (instead of a more informative error)
Adding a port number to a unicast host in elasticsearch.yml causes that node to recieve invalid (ie unparseable) http requests
Missing a newline in a bulk insert request caused subsequent queries on that index to return invalid json
Doing a delete by query on an index that removed a significant number of documents caused refresh requests on that index to return NullPointerExceptions
Shards moving between nodes for no apparent reason
Shards becoming unassigned for no apparent reason
Shards becoming unassigned even when all of the shards in the cluster had been routed manually and shard allocation had been disabled
Shards losing all of their documents if a write is performed while it's unavailable

Opened this ticket for the bulk problem (applies to regular indexing too it seems): elastic/elasticsearch#7299

Will look into the delete-by-query and refresh situation, see if i can reproduce it.

Will look into the "initializing"-delete-all-docs situation too.

No idea about the port situation...Ive never heard that before (and we routinely change/configure ports for ourselves and various customers). Were you using the transport port, or the HTTP port?

This might be intended behavior but it makes the db difficult to work with operationally. You want to be able to predict when expensive operations are going to happen so you can make sure that they don't interfere with the work your db is supposed to be doing.

If you don't want shards moving around at all, you can set:

curl -XPUT "http://localhost:9200/_cluster/settings" -d'
{
  "persistent": {
    "cluster.routing.allocation.enable" : "none"
  }
}'

Prevents any rebalancing/allocation at all. Or you could set new_primaries, which is likely the better option: it will allocate new primaries but nothing else.

Ultimately, ES is designed to perform these maintenance operations in the background. Rather than preventing it, it's better to just throttle the operations until they don't affect your cluster anymore.

You can throttle the process using: indices.recovery.max_bytes_per_sec, and set it to something reasonable that doesn't overwhelm your network/disk IO.
You could also set cluster.routing.allocation.cluster_concurrent_rebalance: 1 which only allows one rebalance to occur in the cluster at a given time, to throttle how much background activity is happening.

bobpoekert/gist:f4613bde4fabae5b50bb

polyfractal commented Aug 15, 2014