Skip to content

Instantly share code, notes, and snippets.

@ndpar
Last active November 16, 2016 12:44
Show Gist options
  • Save ndpar/3813775 to your computer and use it in GitHub Desktop.
Save ndpar/3813775 to your computer and use it in GitHub Desktop.
Export documents from Solr core to XML format
#!/usr/bin/env groovy
/**
* Usage: ./SolrExporter.groovy query url [url]
*
* ./SolrExporter.groovy "id:12345" "http://your.solr.host:8983/solr/core/"
*
* ./SolrExporter.groovy "id:12345" "http://old.solr.host:8983/solr/core/" "http://new.solr.host:8983/solr/core/"
*
*
* You can also use this expoorter to reindex Solr (e.g. after incompatible schema change):
*
* ./SolrExporter.groovy "*:*" "http://localhost:8983/solr/core/" "http://localhost:8983/solr/core/"
*/
import org.apache.solr.client.solrj.SolrQuery
import org.apache.solr.client.solrj.SolrServer
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.client.solrj.util.ClientUtils
import org.apache.solr.common.SolrDocument
import org.apache.solr.common.SolrInputDocument
@Grapes([
@Grab(group = 'org.apache.solr', module = 'solr-solrj', version = '1.4.1'),
@Grab(group = 'org.slf4j', module = 'slf4j-simple', version = '1.6.4')
])
class SolrDocumentExporter {
private SolrServer sourceServer
private SolrServer targetServer
private SolrQuery query
SolrDocumentExporter(q, source) {
this(q, source, null)
}
SolrDocumentExporter(q, source, target) {
query = solrQuery(q)
sourceServer = new CommonsHttpSolrServer(source)
if (target) targetServer = new CommonsHttpSolrServer(target)
}
void exportDocuments() {
List<SolrDocument> resultDocuments = executeQuery(sourceServer, query)
List<SolrInputDocument> inputDocuments = inputDocuments(resultDocuments)
if (targetServer) updateDocuments(targetServer, inputDocuments)
else printDocuments(inputDocuments)
}
private SolrQuery solrQuery(String q) {
new SolrQuery(query:q, start:0, rows:Integer.MAX_VALUE)
}
private List<SolrDocument> executeQuery(server, query) {
server.query(query).response.get("response")
}
private List<SolrInputDocument> inputDocuments(documents) {
documents.collect { ClientUtils.toSolrInputDocument(it) }
}
private void updateDocuments(server, documents) {
server.add(documents)
server.commit()
}
private void printDocuments(documents) {
documents.each { println(ClientUtils.toXML(it)) }
}
}
target = (args.length > 2) ? args[2] : null
new SolrDocumentExporter(args[0], args[1], target).exportDocuments()
@amalinovskiy
Copy link

Hi!

Was trying to use this script and noticed that since it is loading the whole core into memory, it does not work for large cores - so I did couple of changes so it would use deep pagination feature in solr - https://gist.github.com/amalinovskiy/743ca7ef2548a28e68b085200f87bde8

Thanks a lot for sharing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment