# This code snippet is runnable
# Download Elasticsearch zip
curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.7.zip
# Unzip to desired location
unzip elasticsearch-0.90.7.zip -d $HOME
# Set ES_HOME
export ES_HOME=$HOME/elasticsearch-0.90.7
# Clone the service wrapper project
cd $ES_HOME/bin
git clone [email protected]:elasticsearch/elasticsearch-servicewrapper.git
# Symlink the service directory
ln -s elasticsearch-servicewrapper/service service
# Edit the service config
sed -i -e '1s|<.*>|'"$ES_HOME"'|' service/elasticsearch.conf
# Start the service
./service/elasticsearch start
For development environments, we can setup a minimal replica set with a primary, a secondary, and an arbiter on the same physical machine. This setup provides zero fault tolerance, since all instances will die if the machine crashes, and therefore should not be used in production.
Official MongoDB documentation:
- Deploy a Replica Set for Testing and Development
- Add an Arbiter to Replica Set
- Three Member Replica Sets
- Convert a Standalone to a Replica Set
# Download MongoDB
curl -O http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-2.4.8.tgz
tar xzf mongodb-osx-x86_64-2.4.8.tgz
cd mongodb-osx-x86_64-2.4.8/bin
# Start the primary with HTTP console
mkdir -p /data/rs0/db0 /data/rs0/log0
mongod --port 27017 --dbpath /data/rs0/db0 --logpath /data/rs0/log0 --replSet rs0 --rest &
# Start the secondary, conserve disk space
mkdir -p /data/rs0/db1 /data/rs0/log1
mongod --port 27018 --dbpath /data/rs0/db1 --logpath /data/rs0/log1 --replSet rs0 --smallfiles --oplogSize 128 &
# Start the arbiter, conserve disk space
mkdir -p /data/rs0/db2 /data/rs0/log2
mongod --port 27019 --dbpath /data/rs0/db2 --logpath /data/rs0/log2 --replSet rs0 --smallfiles --oplogSize 1 &
cd $ES_HOME
./bin/plugin --install elasticsearch/elasticsearch-mapper-attachments/1.9.0
./bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/1.7.2
Create the river config JSON file. See the github wiki page for more options.
{
"type": "mongodb",
"mongodb": {
"servers": [
{ "host": "localhost", "port": 27017 },
{ "host": "localhost", "port": 27018 }
],
"options": {
"secondary_read_preference": true
},
"credentials": [
{ "db": "dbname", "user": "dbname_user", "password": "dbname_password" },
{ "db": "local", "user": "local_user (for the oplog, if necessary)", "password": "local_password" },
{ "db": "admin", "user": "admin_user (if necessary)", "password": "admin_password" }
],
"db": "dbname",
"collection": "collname"
},
"index": {
"name": "indexname",
"type": "typename"
}
}
If we saved the above config as river_config.json
, and we want a river named my_mongo_river
, run the following command to configure your river:
# Configure the river
curl -XPUT http://localhost:9200/_river/my_mongo_river/_meta -d @river_config.json
Some fields are often numbers but sometimes contain strings. These fields should have the following mapping to allow indexing on both numeric and string values:
{
"<typename>": {
"properties": {
"<fieldname>": {
"type": "multi_field",
"fields": {
"<fieldname>": {
"type": "string",
"index": "not_analyzed"
},
"num": {
"type": "double",
"ignore_malformed": true
},
}
}
}
}
}
This mapping declares that <fieldname>
is of type multi_field
, allowing more than one type for the field.
- The default type is
string
and its raw value is indexed without applying analyzers (whitespace removal, tokenizers, stemming, stop words, etc). - The
<fieldname>.num
field is of typedouble
and can be queried or filtered as a numeric type. Theignore_malformed
option must be enabled to allow string values to co-exist. Without this option, documents with non-numeric string values for<fieldname>
cannot be added to the index because a NumberFormatException is thrown while attempting to parse the field value.
Perform an HTTP PUT to update the mapping:
curl -XPUT http://localhost:9200/<indexname>/<typename>/_mapping -d @mapping.json
Elasticsearch reference documentation: