David Pilato dadoonet

ElasticSearch.org Website Search: Field Notes

These are field notes gathered during installation of website search facility for the ElasticSearch website.

You may re-use it to put a similar system in place.

The following assumes:

Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.

To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.

Our setup:

elasticsearch

We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.

	# Run me with:
	#
	# $ nginx -p /path/to/this/file/ -c nginx.conf
	#
	# All requests are then routed to authenticated user's index, so
	#
	# GET http://user:password@localhost:8080/_search?q=*
	#
	# is rewritten to:
	#

	#!/bin/bash
	# herein we backup our indexes! this script should run at like 6pm or something, after logstash
	# rotates to a new ES index and theres no new data coming in to the old one. we grab metadatas,
	# compress the data files, create a restore script, and push it all up to S3.
	TODAY=`date +"%Y.%m.%d"`
	INDEXNAME="logstash-$TODAY" # this had better match the index name in ES
	INDEXDIR="/usr/local/elasticsearch/data/logstash/nodes/0/indices/"
	BACKUPCMD="/usr/local/backupTools/s3cmd --config=/usr/local/backupTools/s3cfg put"
	BACKUPDIR="/mnt/es-backups/"
	YEARMONTH=`date +"%Y-%m"`

	Why is there no such DataImportHandler thing in ElasticSearch? Uhm, well ... but because:

	1. You should really consider your own scripts
	(be it jvm based, perl, ruby, php, nodejs/javascript)
	to feed ElasticSearch via bulk indexing:
	http://www.elasticsearch.org/guide/reference/java-api/bulk.html

	2. There are two projects doing it already:
	* http://code.google.com/p/sql-to-nosql-importer/
	* https://github.com/Aconex/scrutineer (keeps DB in synch with ES or solr!)

	cd ~
	sudo apt-get update
	sudo apt-get install openjdk-7-jre-headless -y

	# Download the compiled elasticsearch rather than the source.
	wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.2.tar.gz -O elasticsearch.tar.gz
	tar -xf elasticsearch.tar.gz
	rm elasticsearch.tar.gz
	sudo mv elasticsearch-* elasticsearch
	sudo mv elasticsearch /usr/local/share

	#!/bin/bash

	set -e

	if [ "x$1" == "x-h" ] ; then
	echo "Usage: $0 version destdir plugins"
	exit
	fi

	CURRENT="0.90.0.RC1"

	VERSION=0.20.6

	sudo apt-get update
	sudo apt-get install openjdk-6-jdk

	wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-$VERSION.deb
	sudo dpkg -i elasticsearch-$VERSION.deb

	# be sure you add "action.disable_delete_all_indices" : true to the config!!

	curl -XDELETE "http://localhost:9200/test?pretty"
	curl -XPOST "http://localhost:9200/test?pretty" -d '{
	"settings": {
	"index": {
	"number_of_shards": 1,
	"number_of_replicas": 0,
	"analysis":{
	"analyzer":{
	"suggest":{
	"type": "custom",

	// Set codec, dir and segmentName accordingly to the segment you are trying to restore
	Codec codec = new Lucene42Codec();
	Directory dir = FSDirectory.open(new File("/tmp/test"));
	String segmentName = "_0";

	IOContext ioContext = new IOContext();
	SegmentInfo segmentInfos = codec.segmentInfoFormat().getSegmentInfoReader().read(dir, segmentName, ioContext);
	Directory segmentDir;
	if (segmentInfos.getUseCompoundFile()) {
	segmentDir = new CompoundFileDirectory(dir, IndexFileNames.segmentFileName(segmentName, "", IndexFileNames.COMPOUND_FILE_EXTENSION), ioContext, false);