These are field notes gathered during installation of website search facility for the ElasticSearch website.
You may re-use it to put a similar system in place.
The following assumes:
These are field notes gathered during installation of website search facility for the ElasticSearch website.
You may re-use it to put a similar system in place.
The following assumes:
# Run me with: | |
# | |
# $ nginx -p /path/to/this/file/ -c nginx.conf | |
# | |
# All requests are then routed to authenticated user's index, so | |
# | |
# GET http://user:password@localhost:8080/_search?q=* | |
# | |
# is rewritten to: | |
# |
#!/bin/bash | |
# herein we backup our indexes! this script should run at like 6pm or something, after logstash | |
# rotates to a new ES index and theres no new data coming in to the old one. we grab metadatas, | |
# compress the data files, create a restore script, and push it all up to S3. | |
TODAY=`date +"%Y.%m.%d"` | |
INDEXNAME="logstash-$TODAY" # this had better match the index name in ES | |
INDEXDIR="/usr/local/elasticsearch/data/logstash/nodes/0/indices/" | |
BACKUPCMD="/usr/local/backupTools/s3cmd --config=/usr/local/backupTools/s3cfg put" | |
BACKUPDIR="/mnt/es-backups/" | |
YEARMONTH=`date +"%Y-%m"` |
Why is there no such DataImportHandler thing in ElasticSearch? Uhm, well ... but because: | |
1. You should really consider your own scripts | |
(be it jvm based, perl, ruby, php, nodejs/javascript) | |
to feed ElasticSearch via bulk indexing: | |
http://www.elasticsearch.org/guide/reference/java-api/bulk.html | |
2. There are two projects doing it already: | |
* http://code.google.com/p/sql-to-nosql-importer/ | |
* https://github.com/Aconex/scrutineer (keeps DB in synch with ES or solr!) |
Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.
To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.
We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.
cd ~ | |
sudo apt-get update | |
sudo apt-get install openjdk-7-jre-headless -y | |
# Download the compiled elasticsearch rather than the source. | |
wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.2.tar.gz -O elasticsearch.tar.gz | |
tar -xf elasticsearch.tar.gz | |
rm elasticsearch.tar.gz | |
sudo mv elasticsearch-* elasticsearch | |
sudo mv elasticsearch /usr/local/share |
#!/bin/bash | |
set -e | |
if [ "x$1" == "x-h" ] ; then | |
echo "Usage: $0 version destdir plugins" | |
exit | |
fi | |
CURRENT="0.90.0.RC1" |
VERSION=0.20.6 | |
sudo apt-get update | |
sudo apt-get install openjdk-6-jdk | |
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-$VERSION.deb | |
sudo dpkg -i elasticsearch-$VERSION.deb | |
# be sure you add "action.disable_delete_all_indices" : true to the config!! |
curl -XDELETE "http://localhost:9200/test?pretty" | |
curl -XPOST "http://localhost:9200/test?pretty" -d '{ | |
"settings": { | |
"index": { | |
"number_of_shards": 1, | |
"number_of_replicas": 0, | |
"analysis":{ | |
"analyzer":{ | |
"suggest":{ | |
"type": "custom", |
// Set codec, dir and segmentName accordingly to the segment you are trying to restore | |
Codec codec = new Lucene42Codec(); | |
Directory dir = FSDirectory.open(new File("/tmp/test")); | |
String segmentName = "_0"; | |
IOContext ioContext = new IOContext(); | |
SegmentInfo segmentInfos = codec.segmentInfoFormat().getSegmentInfoReader().read(dir, segmentName, ioContext); | |
Directory segmentDir; | |
if (segmentInfos.getUseCompoundFile()) { | |
segmentDir = new CompoundFileDirectory(dir, IndexFileNames.segmentFileName(segmentName, "", IndexFileNames.COMPOUND_FILE_EXTENSION), ioContext, false); |