Setting up an external Zookeeper Solr Cluster

This is a step by step instruction on how to create a cluster that has three Solr nodes running in cloud mode. These instructions should work on both a local cluster (for testing) and a remote cluster where each server runs in its own physical machine. This was tested on Solr version 5.4.1 and Zookeeper version 3.4.6

Installing Solr and Zookeeper

Download and extract Solr:
- curl -O http://apache.arvixe.com/lucene/solr/5.4.1/solr-5.4.1.tgz
- mkdir /opt/solr
- tar -zxvf solr-5.4.1.tgz -C /opt/solr --strip-components=1
Download and extract ZooKeeper:
- curl -O http://apache.arvixe.com/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
- mkdir /opt/zookeeper
- tar -zxvf zookeeper-3.4.6.tar.gz -C /opt/zookeeper --strip-components=1

Configuring a ZooKeeper quorum

cd /opt/zookeeper
Create our data directories:
- mkdir -p /data/zookeeper/z1/data
- mkdir -p /data/zookeeper/z2/data
- mkdir -p /data/zookeeper/z3/data

Add our server ids at each data directory:

echo 1 > /data/zookeeper/z1/data/myid
echo 2 > /data/zookeeper/z2/data/myid
echo 3 > /data/zookeeper/z3/data/myid

Copy the sample config file into the first data directory:

cp ./conf/zoo_sample.cfg /data/zookeeper/z1/zoo.cfg

Open /data/zookeeper/z1/zoo.cfg in your text editor and update/add the following values:

dataDir=/data/zookeeper/z1/data
clientPort=2181
# Our zookeeper quorum:
# Note: If you want to run each server in its own machine,
# change the ip address in each entry; Obviously, you
# do not have to use a different port number for each server.
server.1=127.0.0.1:2222:2223
server.2=127.0.0.1:3333:3334
server.3=127.0.0.1:4444:4445

Copy the config file into the data directories of the other two servers:

cp /data/zookeeper/z1/zoo.cfg /data/zookeeper/z2/
cp /data/zookeeper/z1/zoo.cfg /data/zookeeper/z3/

[This step is ONLY required when running the servers on the same host] Change both the dataDir and clientPort for the last two servers:

Server 2:

  $ vi /data/zookeeper/z2/zoo.cfg
  dataDir=/data/code/zookeeper/z2/data
  clientPort=2182

Server 3:

  $ vi /data/zookeeper/z3/zoo.cfg
  dataDir=/data/code/zookeeper/z3/data
  clientPort=2183

Start the servers:

 $ ./bin/zkServer.sh start /data/zookeeper/z1/zoo.cfg
 $ ./bin/zkServer.sh start /data/zookeeper/z2/zoo.cfg
 $ ./bin/zkServer.sh start /data/zookeeper/z3/zoo.cfg

Ensure that there are no major errors in the log file:
- $ cat zookeeper.out

Configuring Solr

cd /opt/solr
Start three Solr instances and have them point at our Zookeeper instances:

# If you are running the Zookeeper servers on remote machines, use 
# the IP address of each server instead of the localhost.
$ ./bin/solr start -c -p 8983 -z 127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
$ ./bin/solr start -c -p 8984 -z 127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
$ ./bin/solr start -c -p 8985 -z 127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183

Upload our collection configuration to ZooKeeper:

$ ./server/scripts/cloud-scripts/zkcli.sh -cmd upconfig \
-zkhost 127.0.0.1:2181 \
-confdir ./server/solr/configsets/data_driven_schema_configs/conf/ \
-confname my-config

Create a Solr collection using the uploaded configuration.

curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=my-colection&numShards=2&replicationFactor=1&collection.configName=my-config'

Note: If you want to create multiple collections with different schemas, then repeat the last two steps for each collection that uses a different schema. Otherwise, Zookeeper will sync the schema for all collections and you will end up with a single schema for all collections.

LCHCAPITALHUMAIN/zookeeper-solr-cloud.md