Created
February 15, 2017 16:48
-
-
Save gibrown/b039039666e387ed6b0dcefb45203420 to your computer and use it in GitHub Desktop.
Populate an Elasticsearch index from bash and json files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# To prep a file for this script: | |
# - take a list of docs orig.json with one json doc per line | |
# - run: split -l 1000 orig.json orig-split | |
export ESINDEX="$1" #ES index name | |
export ESTYPE="$2" #ES document type name | |
JSONFILE="$3" #JSON file path name. One doc per line. | |
export HOST="" | |
DOCID=1 | |
DOCS=`wc -l $JSONFILE | awk {'print $1'}` | |
echo "Indexing $DOCS $ESTYPE documents to $ESINDEX in 5 sec" | |
sleep 5 | |
echo "Prepping bulk data" | |
rm tmp-bulk/bulk* #cleanup | |
awk ' {print "{\"index\":{}}"; print;}' $JSONFILE | split -a 4 -l 3000 - tmp-bulk/bulk- | |
echo "Indexing..." | |
# we're assuming we aren't worried about losing data and setting consistency to 1 to speed this up | |
ls tmp-bulk/bulk* | xargs -L1 -I 'FILE' sh -c 'curl --silent -XPOST "http://localhost:9200/$ESINDEX/$ESTYPE/_bulk?consistency=one" --data-binary @FILE -o /dev/null; echo ".";' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
INDEX="$1" #ES index name | |
JSONFILE="$2" #JSON file path name containing the settings for the index | |
HOST="http://localhost:9200" | |
DOCID=1 | |
echo "Creating index $INDEX" | |
curl -XPUT "$HOST/$INDEX" --data-binary @$JSONFILE | |
echo "Done" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment