Last active
July 27, 2016 19:36
-
-
Save anarchivist/bfaff7f91c218b8485e26671e47f18e5 to your computer and use it in GitHub Desktop.
Delete from CouchDB/Elasticsearch based on Elasticsearch/Elasticdump results
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Modified based on https://gist.github.com/anarchivist/ba97611d13331ce78307 | |
| # Generate a JSON dump file of the records we want to remove from both CouchDB and Elasticsearch | |
| elasticdump --input=http://$ELASTICSEARCH_HOST:9200/dpla_alias --output=missouri_ingestionSequence_4 --searchBody=' | |
| {"query": | |
| {"filtered": | |
| {"query": | |
| {"bool": | |
| {"must":[ | |
| {"query_string": | |
| {"query":"4","default_operator":"AND","lenient":true,"fields":["ingestionSequence"]}}, | |
| {"query_string":{"query":"Missouri","default_operator":"AND","lenient":true,"fields":["provider.name"]} | |
| } | |
| ]} | |
| } | |
| } | |
| } | |
| }' | |
| # Process the dump file to create a CouchDB bulk update document. Note that we can't use the HTTP DELETE verb | |
| # based on upon the behavior of the CouchDB river for Elasticsearch; see the following for more info. | |
| # https://github.com/elastic/elasticsearch-river-couchdb#indexing-databases-with-multiple-types | |
| cat missouri_ingestionSequence_4 | jq '{"docs": [.[]._source | . + {"_deleted": true} | del(.score)]}' > missouri_to_delete | |
| # Post the bulk update document to CouchDB. The river will automatically delete the records from Elasticsearch | |
| # when processing the updates. | |
| curl -H "Content-Type: application/json" -X POST --data-binary @missouri_to_delete "http://$COUCHDB_USER:$COUCHDB_PASS@$COUCHDB_HOST:5984/dpla/_bulk_docs" | |
| # Finally, clear the caches. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment