These are field notes gathered during installation of website search facility for the ElasticSearch website.
You may re-use it to put a similar system in place.
The following assumes:
- You are on a Ubuntu Linux system, or compatible/similar
- You have
sudo
permisssions for the system
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install build-essential curl vim nmap
sudo apt-get install ruby ruby-dev libopenssl-ruby
We cannot install Git from packages, at the moment Ubuntu comes with 1.7.0.4, a year old version. Unbelievable.
sudo apt-get install libz-dev tk
cd ~
wget http://kernel.org/pub/software/scm/git/git-1.7.4.4.tar.bz2
./configure --prefix=/usr/local
sudo make install clean
Install Sun Java.
Anectodal evidence suggests that any “open Java” will break stuff. Any evidence to the contrary seeked and desired.
sudo vim /etc/apt/sources.list
deb http://archive.canonical.com/ubuntu lucid partner
deb-src http://archive.canonical.com/ubuntu lucid partner
sudo apt-get install sun-java6-jdk
java -version
cd /usr/local/lib
sudo curl -k -L -o elasticsearch-0.15.0.tar.gz http://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.15.0.tar.gz
sudo tar -zxvf elasticsearch-0.15.2.tar.gz
rm elasticsearch-0.15.2.tar.gz
Add user for ElasticSearch and other associated services:
sudo adduser --home /home/elasticsearch --disabled-password --system --group elasticsearch
Important! Increase the open files limit for the elasticsearch
user:
sudo vim /etc/security/limits.conf
elasticsearch - nofile 32000
elasticsearch - memlock unlimited
sudo vim /etc/pam.d/su
session required pam_limits.so
Set cluster name, paths where you want to store logs and data and other options for ElasticSearch:
cd /usr/local/lib/elasticsearch-0.15.2
sudo vim config/elasticsearch.yml
# Cluster Settings
cluster:
name: elasticsearch_website
path:
logs: /var/log/elasticsearch
data: /var/data/elasticsearch
boostrap:
mlockall: true
Make sure proper permissions are set:
sudo mkdir -p /var/log/elasticsearch
sudo chown -R elasticsearch:admin /var/log/elasticsearch
sudo chmod -R ug+rw /var/log/elasticsearch/
sudo mkdir -p /var/data/elasticsearch
sudo chown -R elasticsearch:admin /var/data/elasticsearch
sudo chmod -R ug+rw /var/data/elasticsearch
sudo mkdir -p /var/run/elasticsearch
sudo chown -R elasticsearch:admin /var/run/elasticsearch
sudo chmod -R ug+rw /var/run/elasticsearch
sudo -H -u elasticsearch /usr/local/lib/elasticsearch-0.15.2/bin/elasticsearch -p /var/run/elasticsearch/elasticsearch1.pid
curl http://localhost:9200
cd /var/data
sudo git clone git://github.com/elasticsearch/elasticsearch.github.com.git elasticsearch_website
sudo chown -R elasticsearch:admin /var/data/elasticsearch_website
sudo chmod -R ug+rw /var/data/elasticsearch_website
sudo gem install jekyll
Hide is tiny application to allow importing the Jekyll website data into ElasticSearch and to receive Github HTTP post-receive notifications.
sudo mkdir -p /var/applications
cd /var/applications/
sudo git clone git://github.com/karmi/hide.git
sudo chown -R elasticsearch:admin /var/applications
sudo chmod -R ug+rw /var/applications
cd /var/applications/hide
sudo cp config.example.rb config.rb
sudo vim config.rb
:path => '/var/data/elasticsearch_website'
sudo chown -R elasticsearch:admin /var/applications/hide/config.rb
sudo gem install bundler -v 1.0.10
sudo -H -u elasticsearch bundle install
Import website data into ElasticSearch:
sudo -H -u elasticsearch bundle exec rake index:destroy index:setup index:import
Start the post-receive hook server:
sudo -H -u elasticsearch /usr/bin/env BUNDLE_GEMFILE=/var/applications/hide/Gemfile /usr/bin/bundle exec thin --chdir /var/applications/hide --rackup /var/applications/hide/config.ru --port 5000 --log /var/applications/hide/log/thin.log --pid /var/applications/hide/tmp/thin.pid --environment production --tag hide --daemonize start
Test the post-receive hook via Github (https://github.com/elasticsearch/elasticsearch.github.com/admin/hooks#generic_minibucket
).
You can just click it.
We will use Varnish to serve as a restricting proxy for ElasticSearch. (Of course, we could also use Nginx, Apache, etc. as a proxy.)
We will allow only GET
requests to the _search
endpoint. In the future, we may do more interesting tricks.
Install:
curl http://repo.varnish-cache.org/debian/GPG-key.txt | sudo apt-key add -
sudo vim /etc/apt/sources.list
deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-2.1
sudo apt-get update
sudo apt-get install varnish
Configure:
sudo chown -R elasticsearch:admin /etc/varnish
sudo chmod -R ug+rw /etc/varnish
sudo chown -R elasticsearch:admin /var/lib/varnish/
sudo chmod -R ug+rw /var/lib/varnish/
sudo vim /etc/varnish/default.vcl
backend default {
.host = "127.0.0.1";
.port = "9200";
}
sub vcl_recv {
if (req.request != "GET" || req.url !~ "/_search") {
error 403;
}
}
sub vcl_fetch {
set beresp.grace = 30m;
}
sub vcl_error {
set obj.http.Content-Type = "text/html; charset=utf-8";
synthetic {"
<!DOCTYPE html>
<html>
<head>
<title>"} obj.status " " obj.response {"</title>
</head>
<body>
<h1>Error "} obj.status " " obj.response {"</h1>
<p>Use the <a href='/_search?pretty=true&q=*'>/<code>_search</code></a> API.</p>
<hr>
<p><a href='http://elasticsearch.org'>http://elasticsearch.org</a></p>
</body>
</html>
"};
return (deliver);
}
Start:
sudo mkdir -p /var/run/varnish/
sudo chown -R elasticsearch:admin /var/run/varnish
sudo chmod -R ug+rw /var/run/varnish
sudo su - elasticsearch -c "/usr/sbin/varnishd -f /etc/varnish/default.vcl -a 0.0.0.0:80 -P /var/run/varnish/varnishd.pid"
We will put the system under surveillance with Monit.
Install and enable:
sudo apt-get install monit
sudo vim /etc/default/monit
# You must set this variable to for monit to start
startup=1
sudo /etc/init.d/monit start
Configure:
sudo vim /etc/monit/monitrc
# ###################
# Monit Configuration
# ###################
set daemon 120
with start delay 240
set alert [email protected]
set mailserver localhost
set httpd port 2812 and
use address localhost
allow localhost
check system search.elasticsearch.org
if loadavg (5min) > 10 then alert
if memory usage > 80% then alert
if cpu usage (user) > 90% then alert
check filesystem data with path /var
if space usage > 80% for 5 times within 15 cycles then alert
if inode usage > 90% then alert
if space usage > 99% then stop
if inode usage > 99% then stop
group filesystem
check host elasticsearch with address 127.0.0.1
if failed url http://127.0.0.1:9200/ with timeout 15 seconds then alert
group elasticsearch
check process elasticsearch1 with pidfile /var/run/elasticsearch/elasticsearch1.pid
start program = "/usr/bin/sudo -H -u elasticsearch /usr/local/lib/elasticsearch-0.15.2/bin/elasticsearch -p /var/run/elasticsearch/elasticsearch1.pid" with timeout 60 seconds
stop program = "/bin/kill $(/bin/cat /var/run/elasticsearch/elasticsearch1.pid)"
if cpu > 90% for 5 cycles then restart
if totalmem > 2 GB for 5 cycles then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout
group elasticsearch
check process varnishd with pidfile /var/run/varnish/varnishd.pid
start program = "/usr/sbin/varnishd -f /etc/varnish/default.vcl -a 0.0.0.0:80 -P /var/run/varnish/varnishd.pid" with timeout 60 seconds
stop program = "/bin/kill $(/bin/cat /var/run/varnish/varnishd.pid)"
if cpu > 90% for 5 cycles then restart
if totalmem > 500 MB for 5 cycles then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout
group elasticsearch
check process post_receive_server with pidfile /var/applications/hide/tmp/thin.pid
start program = "/usr/bin/sudo -H -u elasticsearch /usr/bin/env BUNDLE_GEMFILE=/var/applications/hide/Gemfile /usr/bin/bundle exec thin --chdir /var/applications/hide --rackup /var/applications/hide/config.ru --port 5000 --log /var/applications/hide/log/thin.log --pid /var/applications/hide/tmp/thin.pid --environment production --tag hide --daemonize start" with timeout 60 seconds
stop program = "/bin/kill $(/bin/cat /var/applications/hide/tmp/thin.pid)"
if cpu > 90% for 5 cycles then restart
if totalmem > 2 GB for 5 cycles then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout
group git
Use SSH tunnel to connect to Monit GUI:
ssh elasticsearch -L 2812:localhost:2812
open http://localhost:2812
Otherwise, just check it on the CLI:
sudo monit status
To reload Monit configuration, use:
sudo monit reload
To start all services, use:
sudo monit start all
Congratulations! You now have “continuous indexing” system set up for searching your Jekyll website with ElasticSearch.
Author: Karel Minarik