Provisioning EC2 instance for ElasticSearch

EC2 instance type

Use c3.xlarge instance or c3.2xlarge instance to store ElasticSearch server. We need a fair bit of RAM and computing power, as well as 2 SSD disks for storage. We need to choose exising IAM role or create a new one so that ElasticSearch can seamlessly discover other cluster nodes hosted on EC2. Also, don't forget to add both SSDs to the list of devices (/dev/sdb and /dev/sdc).

Linux system modification

Increase number of open file handles

First we need to increase system-wide file descriptor limits. To make it work temporary (will reset on the next reboot) execute from bash:

sysctl -w fs.file-max=100000
sysctl -w vm.max_map_count=262144

To make it more permanent i.e. available after the reboot we need to edit /etc/sysctl.conf file using:

sudoedit /etc/sysctl.conf

and add following entries:

fs.file-max = 100000
vm.max_map_count = 262144

Now we need to apply changes with

sysctl -p

and verify if all went well by executing

cat /proc/sys/fs/file-max

Increase user Level file descriptor Limits

Execute following lines from bash:

echo "* soft nofile 100000" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 100000" | sudo tee -a /etc/security/limits.conf
echo "* soft memlock unlimited" | sudo tee -a /etc/security/limits.conf
echo "* hard memlock unlimited" | sudo tee -a /etc/security/limits.conf

Verify hard and soft limits with (need to logout to apply):

ulimit -Hn
ulimit -Sn

Disable system swap

sudo swapoff -a

Configure kernel's swappiness

For a temporary solution (resets after the next boot) use

sysctl -w vm.swappiness = 1

or edit /etc/sysctl.conf file using

sudoedit /etc/sysctl.conf

and add:

vm.swappiness = 1

Create raid from attached SSDs:

mount_point=/media/ephemeral0
umount $mount_point
mdadm --stop /dev/md0 /dev/md127
yes | mdadm --create /dev/md0 --level=stripe --raid-devices=2 /dev/sdb /dev/sdc
mkfs.ext4 /dev/md0
mount -t ext4 /dev/md0 $mount_point

Add to /etc/fstab to make raid available after reboot:

/dev/md127 /media/ephemeral0 ext4 defaults,nofail,comment=cloudconfig 0 2

Check with:

sudo mount -a

Optionally, create shell script that does that all:

#!/bin/sh

if [[ `whoami` != root ]]; then
  echo "need root privileges"
  exit
fi

mount_point=/media/ephemeral0
umount $mount_point
mdadm --stop /dev/md0 /dev/md127
yes | mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/sdb /dev/sdc
mkfs.ext4 /dev/md0
mount -t ext4 /dev/md0 $mount_point
chmod 777 $mount_point
sed -i.bak s,/dev/sdb,/dev/md127,g /etc/fstab

Install and configure haproxy and hatop

haproxy

sudo yum install -y haproxy

hatop

wget http://hatop.googlecode.com/files/hatop-0.7.7.tar.gz
tar xvf hatop-0.7.7.tar.gz && cd hatop-0.7.7
sudo install -m 755 bin/hatop /usr/local/bin
sudo install -m 644 man/hatop.1 /usr/local/share/man/man1
sudo gzip /usr/local/share/man/man1/hatop.1

Hatop will be called with (once haproxy is configured) hatop -s /var/lib/haproxy/stats

Configure rsyslog to accept haproxy log events

Edit /etc/rsyslog.conf file with

sudoedit /etc/rsyslog.conf

Uncomment following lines (allows listening from port 514):

#$ModLoad imudp
#$UDPServerRun 514

Add line (limits listening from loopback)

$UDPServerAddress 127.0.0.1

Configure local2 events to go to the /var/log/haproxy.log file by creating new file and then add to it:

sudoedit /etc/rsyslog.d/haproxy.conf

local2.* /var/log/haproxy.log

Restart rsyslog service with sudo service rsyslog restart.

Configure haproxy

Enter following to bash:

sudo cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orig
sudoedit /etc/haproxy/haproxy.cfg

This is how haproxy configuration should look like:

global
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4096
    user        haproxy
    group       haproxy
    nbproc      1
    log         127.0.0.1 local2
    daemon

    stats socket /var/lib/haproxy/stats mode 777 level admin

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option                  http-server-close
    option                  forwardfor except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         5s
    timeout client          90s
    timeout server          90s
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

userlist internal_users
  user app insecure-password xxxxx

userlist external_users
  user ext-user insecure-password xxxxxx

frontend elastic_internal
  bind *:9665
  acl auth_okay http_auth(internal_users)
  http-request auth realm local if !auth_okay
  default_backend elastic

frontend elastic_external
  bind *:9666
  acl auth_okay http_auth(external_users)
  http-request auth realm local if !auth_okay
  default_backend elastic

frontend elastic_kibana_external
  bind *:9667
  acl auth_okay http_auth(external_users)
  http-request auth realm local if !auth_okay
  default_backend elastic_kibana

backend elastic
  stats enable
  balance roundrobin
  option httpchk GET /
  option redispatch
  server elastic1 0.0.0.0:9200 check inter 30s

backend elastic_kibana
  stats enable
  balance roundrobin
  option httpchk GET /
  option redispatch
  server elastic_kibana1 0.0.0.0:5601 check inter 30s

Start haproxy as a service using:

sudo service haproxy start

Make haproxy starts after reboot using:

sudo chkconfig --add haproxy
sudo chkconfig haproxy on

If there is no logging, check if haproxy user (or any other user under which haproxy daemon is running) is owner of /var/log/haproxy.log. Note files & folders locations used by haproxy:

log: /var/log/haproxy.log
admin socket: /var/lib/haproxy/stats
hatop: hatop -s /var/lib/haproxy/stats

Install and configure elasticsearch

There are two ways to install elasticsearch; from source or by adding elasticsearch yum repository.

Install from source

Download the source using:

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.4.zip
sudo unzip elasticsearch-1.4.4.zip /usr/local/elasticsearch && cd /usr/local/elasticsearch/elasticsearch-1.4.4

Install service wrapper using:

wget https://github.com/elasticsearch/elasticsearch-servicewrapper/zipball/master && unzip master
mv elastic-elasticsearch-servicewrapper-8513436/service/ /usr/local/elasticsearch/elasticsearch-1.4.4/bin

Set elasticsearch home folder in elasticsearch.conf:

cd /usr/local/elasticsearch/elasticsearch-1.4.4
vim bin/service/elasticsearch.conf

set.default.ES_HOME=/usr/local/elasticsearch/elasticsearch-1.4.4

Change permission on elasticsearch64 binary:

chmod a+x bin/service/elasticsearch64

Install elasticsearch service:

sudo bin/service/elasticsearch64 install

Run ElasticSearch as service:

sudo bin/service/elasticsearch64 start

Install via yum repository

Add elasticsearch repository to yum:

sudo rpm --import https://packages.elasticsearch.org/GPG-KEY-elasticsearch

Create file using:

sudoedit /etc/yum.repos.d/elasticsearch.repo

and add to it the following content:

[elasticsearch-1.4]
name=Elasticsearch repository for 1.4.x packages
baseurl=http://packages.elasticsearch.org/elasticsearch/1.4/centos
gpgcheck=1
gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1

Install elasticsearch:

sudo yum install -y elasticsearch

###Run elasticsearch after reboot:

sudo chkconfig --add elasticsearch
sudo chkconfig elasticsearch on

Install plugins:

cd /usr/share/elasticsearch
sudo bin/plugin install elasticsearch/elasticsearch-cloud-aws/2.4.1
sudo bin/plugin install mobz/elasticsearch-head
sudo bin/plugin install royrusso/elasticsearch-HQ

Configure ElasticSearch

Add to .bashrc (half of the available memory on EC2 instance):

export ES_HEAP_SIZE=7680m

Reload .bashrc and create a backup copy of elasticsearch.config:

. ~/.bashrc
sudo cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.orig

Modify elasticsearch.yml:

path:
  data: /media/ephemeral0/data
  work: /media/ephemeral0/work
  logs: /media/ephemeral0/logs
bootstrap.mlockall: true
index.store.type: mmapfs
http.compression: true
transport.tcp.compress: true
script.disable_dynamic: false
cluster.name: elastic-logs
discovery:
  type: ec2
  ec2:
    groups: elastic-logs

Install Marvel and Sense

cd /usr/share/elasticsearch
sudo bin/plugin -i elasticsearch/marvel/latest

Using our only ElasticSearch instance to collect Marvel data would be too much, it is recommended to use a separate ES installation for this purpose. Put to elasticsearch.yml following line to disable marvel agent:

marvel.agent.enabled: false

Install Kibana

It is highly recommended to install Kibana on a separate EC2 instance.

cd ~
wget https://download.elasticsearch.org/kibana/kibana/kibana-4.0.1-linux-x64.tar.gz
tar xzvf kibana-4.0.1-linux-x64.tar.gz && cd kibana-4.0.1-linux-x64
bin/kibana

Create ElasticSearch index template for logs

Create idex template on elasticsearch such as. Note that in production we should use e.g. 6 shards and 1 replica. So be careful and don't forget to change template definitions when setting up the production cluster. Also we should try out different values for following settings (check http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/):

indices.memory.index_buffer_size
index.translog.flush_threshold_ops
index.refresh_interval

This needs to be executed in the ElasticSearch:

PUT /_template/logstash-runner
{
  "template": "logstash-runner-*",
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index.refresh_interval": "5s"
  },
  "mappings": {
    "logs": {
      "properties": {
        "host": {
          "type": "string",
          "index": "not_analyzed"
        },
        "time_reported": {
          "type": "date",
          "format": " date_time"
        },
        "id": {
          "type": "long"
        },
        "task": {
          "type": "string",
          "index": "not_analyzed"
        },
        "url": {
          "type": "string",
          "index": "not_analyzed"
        },
        "time": {
          "type": "date",
          "format": "yyyy-MM-dd'T'HH:mm:ss"
        }
      }
    }
  }
}

The alternative to creating index template by executing the above script in elasticsearch is to create a template configuration file. This means creating a templates folder beneath /etc/elasticsearch (the default config path, check this link for path details), and creating logstash_runner.json file containing following:

{
  "logstash-runner": {
    "template": "logstash-runner-*",
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "index.refresh_interval": "5s"
    },
    "mappings": {
      "logs": {
        "properties": {
          "host": {
            "type": "string",
            "index": "not_analyzed"
          },
          "time_reported": {
            "type": "date",
            "format": " date_time"
          },
          "id": {
            "type": "long"
          },
          "task": {
            "type": "string",
            "index": "not_analyzed"
          },
          "url": {
            "type": "string",
            "index": "not_analyzed"
          },
          "time": {
            "type": "date",
            "format": "yyyy-MM-dd'T'HH:mm:ss"
          }
        }
      }
    }
  }
}

Logstash configuration

We need to modify logstash configuration at /etc/logstash/conf.d/my.conf on each EC2 instance sending logs. Use something like the following configuration script:

input {
  file {
    path => "/var/log/my.log"
    start_position => 'beginning'
    sincedb_path => "/var/log/logstash/.sincedb"
    codec => json {
      charset => "UTF-8"
    }
  }
}
filter {
  if [message][url] =~ /.+/ {
    mutate {add_field => {"url" => "%{[message][url]}"}}
  }
  if [message][time] =~ /.+/ {
    mutate {add_field => {"time" => "%{[message][time]}"}}
  }
  mutate {
    remove_field => ["path"]
    rename => ["timeReported", "time_reported"]
    add_field => {"id" => "%{[message][id]}"}
    add_field => {"task" => "%{[message][task]}"}
    remove_field => ["message"]
  }
}
output {
  elasticsearch_http {
    host => "111.222.101.202"
    port => 9665
    index => "logstash-runner-%{+YYYY-MM-dd}"
    index_type => "logs"
    manage_template => false
    user => "runner"
    password => "xxxxxx"
  }
}

okulik/elastic-ec2.md