We had a problem where invalid data got stored in elasticsearch. An array of objects had some objects placed in it that are missing a mandatory field. After fixing the mistake, we wanted to update all offending entires. For this, we need to get the IDs of affected items.
The "obvious" query would be _exists_:general_information AND !(_exists_:general_information.value)
. But as soon as there is any array element with a value, the second condition will consider the value existing. If there are any valid entries in the array, the query will not work as expected.
The solution we found was to use an ES script that loops over the elements in the source document and returns 1 if it finds one that has no data. To our positive surprise, running this on an index with over 1M entries only took a couple of seconds. Definitely not something for a routine query, but an acceptable time for a one-off query to fix a problem.