Last active
March 24, 2017 14:42
-
-
Save mikesname/482519184a40f1ef2e8643f15d86a79c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# First, log into the EHRI staging server | |
# Actually open a bunch of shells | |
# In one of them, tail the following file, which will give us some information | |
# about what goes wrong when something inevitably goes wrong | |
tail -f /opt/webapps/neo4j-version/data/log/console.log | |
# Next, in another shell, copy the file(s) to be ingested to the server | |
# and place them in /opt/webapps/data/import-data/de/de-002409 | |
# (de-002409 is ITS's EHRI ID.) | |
# Errors: certain date patterns are fuzzy parsed by the importer. Invalid | |
# dates such as 31st April will currently throw a runtime exception. So | |
# fix all these first ;) | |
# Import properties handle cerain mappings between tags (with particular | |
# attributes) and EHRI fields. The ITS data has a particular mapping | |
# indicating that when the <unitid> has a type="refcode" that is the | |
# main ID, and the rest are the alternates. This file is: | |
# /opt/webapps/data/import-data/de/de-002409/its-pertinence.properties | |
# The actual import is done via the /ehri/import/ead endpoint on the | |
# Neo4j extension. It is documented here: | |
# http://ehri.github.io/docs/api/ehri-rest/ehri-extension/wsdocs/resource_ImportResource.html | |
# Lets export that as an ENV_VAR: | |
export PROPERTIES=/opt/webapps/data/import-data/de/de-002409/its-pertinence.properties | |
# Also, lets write a log file: | |
echo "Importing ITS data with properties: $PROPERTIES" > LOG.txt | |
export LOG=`pwd`/LOG.txt | |
# So to import a single XML, the command would be: | |
curl -XPOST \ | |
-H "X-User:mike" \ | |
-H "Content-type: text/xml" \ | |
--data-binary @KHSK.xml_GER.xml \ | |
"http://localhost:7474/ehri/import/ead?scope=de-002409&log=$LOG&properties=$PROPERTIES" | |
# If this happens to run out of Java Heap space, you can stop/start Neo4j like so: | |
sudo service neo4j-service restart | |
# (You can give Neo4j more memory by uncommenting and setting | |
# wrapper.java.maxmemory=4000 in $NEO4J_HOME/conf/neo4j-wrapper.conf | |
# If all goes well you should get something like this: | |
# {"created":48430,"unchanged":0,"message":"Import ITS 0.4 data using its-pertinence.properties.\n","updated":0,"errors":{}} | |
# In theory, that ingest should be idemotent, so you can run the same command again | |
# and not change anything: | |
# {"created":0,"unchanged":48430,"message":"Import ITS 0.4 data using its-pertinence.properties.\n","updated":0,"errors":{}} | |
# The final step is the re-index the ITS repository, making the items searchable. This | |
# can be done from the Portal Admin UI, or via the following command: | |
java -jar /opt/webapps/docview/bin/indexer.jar \ | |
--clear-key-value holderId=de-002409 \ | |
--index -H "X-User=admin" \ | |
--stats \ | |
--solr http://localhost:8983/solr/portal \ | |
--rest http://localhost:7474/ehri \ | |
"Repository|de-002409" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment