elasticsearch-mongo-install.md

Install Elasticsearch with Service Wrapper

# This code snippet is runnable

# Download Elasticsearch zip
curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.7.zip

# Unzip to desired location
unzip elasticsearch-0.90.7.zip -d $HOME

# Set ES_HOME
export ES_HOME=$HOME/elasticsearch-0.90.7

# Clone the service wrapper project
cd $ES_HOME/bin
git clone [email protected]:elasticsearch/elasticsearch-servicewrapper.git

# Symlink the service directory
ln -s elasticsearch-servicewrapper/service service

# Edit the service config
sed -i -e '1s|<.*>|'"$ES_HOME"'|' service/elasticsearch.conf

# Start the service
./service/elasticsearch start

Install MongoDB with Replica Sets

For development environments, we can setup a minimal replica set with a primary, a secondary, and an arbiter on the same physical machine. This setup provides zero fault tolerance, since all instances will die if the machine crashes, and therefore should not be used in production.

Official MongoDB documentation:

# Download MongoDB
curl -O http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-2.4.8.tgz
tar xzf mongodb-osx-x86_64-2.4.8.tgz
cd mongodb-osx-x86_64-2.4.8/bin

# Start the primary with HTTP console
mkdir -p /data/rs0/db0 /data/rs0/log0
mongod --port 27017 --dbpath /data/rs0/db0 --logpath /data/rs0/log0 --replSet rs0 --rest &

# Start the secondary, conserve disk space
mkdir -p /data/rs0/db1 /data/rs0/log1
mongod --port 27018 --dbpath /data/rs0/db1 --logpath /data/rs0/log1 --replSet rs0 --smallfiles --oplogSize 128 &

# Start the arbiter, conserve disk space
mkdir -p /data/rs0/db2 /data/rs0/log2
mongod --port 27019 --dbpath /data/rs0/db2 --logpath /data/rs0/log2 --replSet rs0 --smallfiles --oplogSize 1 &

Install Elasticsearch MongoDB River plugin

cd $ES_HOME
./bin/plugin --install elasticsearch/elasticsearch-mapper-attachments/1.9.0
./bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/1.7.2

Create the river config JSON file. See the github wiki page for more options.

{
  "type": "mongodb",
  "mongodb": {
    "servers": [
      { "host": "localhost", "port": 27017 },
      { "host": "localhost", "port": 27018 }
    ],
    "options": {
      "secondary_read_preference": true
    },
    "credentials": [
      { "db": "dbname", "user": "dbname_user", "password": "dbname_password" },
      { "db": "local", "user": "local_user (for the oplog, if necessary)", "password": "local_password" },
      { "db": "admin", "user": "admin_user (if necessary)", "password": "admin_password" }
    ],
    "db": "dbname",
    "collection": "collname"
  },
  "index": {
    "name": "indexname",
    "type": "typename"
  }
}

If we saved the above config as river_config.json, and we want a river named my_mongo_river, run the following command to configure your river:

# Configure the river
curl -XPUT http://localhost:9200/_river/my_mongo_river/_meta -d @river_config.json

Mappings for Pseudo-Numeric Fields

Some fields are often numbers but sometimes contain strings. These fields should have the following mapping to allow indexing on both numeric and string values:

{
  "<typename>": {
    "properties": {
      "<fieldname>": {
        "type": "multi_field",
        "fields": {
          "<fieldname>": {
            "type": "string",
            "index": "not_analyzed"
          },
          "num": {
            "type": "double",
            "ignore_malformed": true
          },
        }
      }
    }
  }
}

This mapping declares that <fieldname> is of type multi_field, allowing more than one type for the field.

The default type is string and its raw value is indexed without applying analyzers (whitespace removal, tokenizers, stemming, stop words, etc).
The <fieldname>.num field is of type double and can be queried or filtered as a numeric type. The ignore_malformed option must be enabled to allow string values to co-exist. Without this option, documents with non-numeric string values for <fieldname> cannot be added to the index because a NumberFormatException is thrown while attempting to parse the field value.

Perform an HTTP PUT to update the mapping:

curl -XPUT http://localhost:9200/<indexname>/<typename>/_mapping -d @mapping.json

Elasticsearch reference documentation:

WillDent/elasticsearch-mongo-install.md

Install Elasticsearch with Service Wrapper

Install MongoDB with Replica Sets

Install Elasticsearch MongoDB River plugin

Mappings for Pseudo-Numeric Fields