Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save anarchivist/bfaff7f91c218b8485e26671e47f18e5 to your computer and use it in GitHub Desktop.

Select an option

Save anarchivist/bfaff7f91c218b8485e26671e47f18e5 to your computer and use it in GitHub Desktop.
Delete from CouchDB/Elasticsearch based on Elasticsearch/Elasticdump results
# Modified based on https://gist.github.com/anarchivist/ba97611d13331ce78307
# Generate a JSON dump file of the records we want to remove from both CouchDB and Elasticsearch
elasticdump --input=http://$ELASTICSEARCH_HOST:9200/dpla_alias --output=missouri_ingestionSequence_4 --searchBody='
{"query":
{"filtered":
{"query":
{"bool":
{"must":[
{"query_string":
{"query":"4","default_operator":"AND","lenient":true,"fields":["ingestionSequence"]}},
{"query_string":{"query":"Missouri","default_operator":"AND","lenient":true,"fields":["provider.name"]}
}
]}
}
}
}
}'
# Process the dump file to create a CouchDB bulk update document. Note that we can't use the HTTP DELETE verb
# based on upon the behavior of the CouchDB river for Elasticsearch; see the following for more info.
# https://github.com/elastic/elasticsearch-river-couchdb#indexing-databases-with-multiple-types
cat missouri_ingestionSequence_4 | jq '{"docs": [.[]._source | . + {"_deleted": true} | del(.score)]}' > missouri_to_delete
# Post the bulk update document to CouchDB. The river will automatically delete the records from Elasticsearch
# when processing the updates.
curl -H "Content-Type: application/json" -X POST --data-binary @missouri_to_delete "http://$COUCHDB_USER:$COUCHDB_PASS@$COUCHDB_HOST:5984/dpla/_bulk_docs"
# Finally, clear the caches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment