The following installation methods were tried out.
With some trial and error, I managed to get a stand alone Ceph Object Storage Device (OSD) service working. The instructions to replicate the installation are as follows.
- Start a vagrant box
vagrant init ubuntu/bionic64 vagrant up
- Install docker
sudo su apt-get install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" apt-get update apt-get install -y docker-ce docker pull ceph/daemon:latest
- Install Ceph Monitor, Manager and OSD components
docker network create --subnet=175.20.0.0/16 ceph docker run -d -h ceph-mon --name ceph-mon --net ceph --ip 175.20.0.12 -e MON_IP=175.20.0.12 -e CEPH_PUBLIC_NETWORK=175.20.0.0/16 -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ ceph/daemon mon # update /etc/ceph/ceph.conf file with the contents given below docker run -d -h ceph-mgr --name ceph-mgr --net=ceph -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ ceph/daemon mgr docker run -d -h ceph-osd --name ceph-osd --net=ceph --pid=host --privileged=true -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /dev/:/dev/ ceph/daemon osd
- Install librados library.
sudo apt-get install librados-dev
- Use librados to communicate with Ceph OSD cluster.
NOTE: The container-based installation runs through the ceph diagnostic commands correctly, but the OSD is unresponsive. The object store and fetch operations do not work. The reason is unknown.
This solution has been tried out on Ubuntu Mate 18.04 + Docker 18.06 CE
The primary instructions followed are available at:
- https://hub.docker.com/r/ceph/daemon/
- https://github.com/ceph/ceph-container/blob/master/src/daemon/README.md
There are SELinux specific instructions were ignored as they are not applicable to Linux kernel with SELinux capabilities.
sudo chcon -Rt svirt_sandbox_file_t /etc/ceph
sudo chcon -Rt svirt_sandbox_file_t /var/lib/ceph
The first error encountered is with respect to etcd KV store.
For the command,
docker run -d -h ceph-kv --name ceph-kv -e KV_TYPE=etcd -e KV_IP=127.0.0.1 -e KV_PORT=2379 ceph/daemon populate_kvstore
I see the following error log.
2018-10-16T10:03:51.514221915Z Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 175.20.0.12:2379: getsockopt: connection refused
2018-10-16T10:03:51.514243724Z
2018-10-16T10:03:51.514256484Z error #0: dial tcp 175.20.0.12:2379: getsockopt: connection refused
2018-10-16T10:03:51.514259320Z
2018-10-16T10:03:51.519791527Z 2018-10-16 10:03:51 /entrypoint.sh: Value is already set
2018-10-16T10:03:51.662737927Z Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 175.20.0.12:2379: getsockopt: connection refused
2018-10-16T10:03:51.662755244Z
2018-10-16T10:03:51.662795795Z error #0: dial tcp 175.20.0.12:2379: getsockopt: connection refused
2018-10-16T10:03:51.662801028Z
2018-10-16T10:03:51.667302541Z 2018-10-16 10:03:51 /entrypoint.sh: client_host already exists
Just to try a dedicated docker network, I modified the KV command as follows.
docker network create --subnet=175.20.0.0/16 ceph
docker run -d -h ceph-kv --name ceph-kv --net=ceph -e KV_TYPE=etcd -e KV_IP=175.20.0.50 -e KV_PORT=2379 -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ ceph/daemon populate_kvstore
The command throws up the following docker container log.
2018-10-16T09:58:47.677911525Z 2018-10-16 09:58:47 /entrypoint.sh: Adding key /osd/osd_journal_size with value 100 to KV store.
2018-10-16T09:58:50.857882777Z Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 175.20.0.50:2379: getsockopt: no route to host
2018-10-16T09:58:50.858023921Z
2018-10-16T09:58:50.858041017Z error #0: dial tcp 175.20.0.50:2379: getsockopt: no route to host
2018-10-16T09:58:50.858043718Z
2018-10-16T09:58:50.861988822Z 2018-10-16 09:58:50 /entrypoint.sh: Value is already set
2018-10-16T09:58:53.857533335Z Error: dial tcp 175.20.0.50:2379: getsockopt: no route to host
2018-10-16T09:58:53.862213259Z 2018-10-16 09:58:53 /entrypoint.sh: client_host already exists
Irrespective of choosing a new IP or an existing IP, the KV container throws up an error.
Hence, I decided to try ceph installation without the KV store.
The given instructions (on https://hub.docker.com/r/ceph/daemon/) fail for Ceph monitor. The following command,
docker run -d --net host --ip 192.168.0.20 -e MON_IP=192.168.0.20 -e CEPH_PUBLIC_NETWORK=192.168.0.0/16 -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ ceph/daemon mon
gives an error shown below.
-1 unable to find any IP address in networks: 192.168.0.20/24
The Ceph OSD component comes up without the monitor, but the rest of the components containers fail to start.
Based on the information provided in ceph/ceph-container#496, I tried a different ceph deployment with a dedicated docker network. This time, KV component and radosgw component come up and soon crash. The rest of the components remain online. The commands used for trying out this version of installation are:
sudo su
apt-get install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update
apt-get install -y docker-ce
docker pull ceph/daemon:latest
docker network create --subnet=175.20.0.0/16 ceph
docker run -d -h ceph-mon --name ceph-mon --net ceph --ip 175.20.0.12 -e MON_IP=175.20.0.12 -e CEPH_PUBLIC_NETWORK=175.20.0.0/16 -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ ceph/daemon mon
docker run -d -h ceph-mgr --name ceph-mgr --net=ceph -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ ceph/daemon mgr
# update /etc/ceph/ceph.conf file with the contents given below
docker run -d -h ceph-osd --name ceph-osd --net=ceph --pid=host --privileged=true -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /dev/:/dev/ ceph/daemon osd
docker run -d -h ceph-mds --name ceph-mds --net=ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /etc/ceph:/etc/ceph -e CEPHFS_CREATE=1 ceph/daemon mds
docker run -d -h ceph-rgw --name ceph-rgw --net=ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /etc/ceph:/etc/ceph ceph/daemon rgw
docker run -d -h ceph-restapi --name ceph-restapi --net=ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /etc/ceph:/etc/ceph ceph/daemon restapi
docker run -d -h ceph-rbd --name ceph-rbd --net=ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /etc/ceph:/etc/ceph ceph/daemon rbd_mirror
# /etc/ceph/ceph.conf file
[global]
fsid = 8bc58859-905e-4b58-9742-09209bca6116
mon initial members = ceph-mon
mon host = 175.20.0.12
public network = 175.20.0.0/16
cluster network = 175.20.0.0/16
osd journal size = 100
log file = /dev/null
keyring = /etc/ceph/ceph.client.admin.keyring
auth cluster required = none
auth service required = none
auth client required = none
osd max object name len = 256
osd max object namespace len = 64
[client.restapi]
public addr = 0.0.0.0:5000
restapi base url = /api/v0.1
restapi log level = warning
log file = /var/log/ceph/ceph-restapi.log
This ceph with dedicated network setup comes closest to a complete setup. All the containers except ceph-restapi
(REST API) and ceph-rgw
(Rados Gateway) remain alive. The ceph-restapi
fails with the following error log.
2018-10-16T14:05:40.543135966Z exec: PID 119: spawning /usr/bin/ceph-rest-api --cluster ceph -n client.admin
2018-10-16T14:05:40.543145120Z exec: Waiting 119 to quit
2018-10-16T14:05:40.543396149Z docker_exec.sh: line 143: /usr/bin/ceph-rest-api: No such file or directory
2018-10-16T14:05:40.543673917Z teardown: managing teardown after SIGCHLD
The radosgw-admin
fails with the following error log.
2018-10-16T14:23:09.290515772Z 2018-10-16 14:23:09 /entrypoint.sh: static: does not generate config
2018-10-16T14:23:09.291261550Z 2018-10-16 14:23:09 /entrypoint.sh: SUCCESS
2018-10-16T14:23:09.291516838Z exec: PID 124: spawning /usr/bin/radosgw --cluster ceph --setuser ceph --setgroup ceph -d -n client.rgw.ceph-rgw -k /var/lib/ceph/radosgw/ceph-rgw.ceph-rgw/keyring
2018-10-16T14:23:09.291568646Z exec: Waiting 124 to quit
2018-10-16T14:23:09.327688131Z 2018-10-16 14:23:09.321 7fd4bde7f8c0 0 framework: civetweb
2018-10-16T14:23:09.327704972Z 2018-10-16 14:23:09.321 7fd4bde7f8c0 0 framework conf key: port, val: 7480
2018-10-16T14:23:09.328618629Z 2018-10-16 14:23:09.325 7fd4bde7f8c0 0 deferred set uid:gid to 167:167 (ceph:ceph)
2018-10-16T14:23:09.328628223Z 2018-10-16 14:23:09.325 7fd4bde7f8c0 0 ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process radosgw, pid 124
2018-10-16T14:28:09.329078701Z 2018-10-16 14:28:09.327 7fd4ab959700 -1 Initialization timeout, failed to initialize
2018-10-16T14:28:09.331421746Z teardown: managing teardown after SIGCHLD
2018-10-16T14:28:09.331435382Z teardown: Waiting PID 124 to terminate
2018-10-16T14:28:09.331438278Z teardown: Process 124 is terminated
2018-10-16T14:28:09.331442149Z teardown: Bye Bye, container will die with return code -1
The final status of the Ceph system can be captured in the following log.
root@ubuntu-bionic:/home/vagrant# docker ps -a
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f6f93a4a1666 ceph/daemon "/entrypoint.sh rest…" 4 seconds ago Exited (255) 3 seconds ago ceph-restapi
898f41f69033 ceph/daemon "/entrypoint.sh rbd_…" About an hour ago Up About an hour ceph-rbd
c102deb3f253 ceph/daemon "/entrypoint.sh rgw" About an hour ago Exited (255) About an hour ago ceph-rgw
7b057aedeb5f ceph/daemon "/entrypoint.sh mds" About an hour ago Up About an hour ceph-mds
a8ccd5cb9472 ceph/daemon "/entrypoint.sh osd" 2 hours ago Up About an hour ceph-osd
88f567eac9f3 ceph/daemon "/entrypoint.sh mgr" 2 hours ago Up About an hour ceph-mgr
b03064033c0f ceph/daemon "/entrypoint.sh mon" 2 hours ago Up About an hour ceph-mon
root@ubuntu-bionic:/home/vagrant# docker exec ceph-osd ceph -s
cluster:
id: 0e6a5ae3-1678-494f-9dcc-939c50473107
health: HEALTH_WARN
1 MDSs report slow metadata IOs
Reduced data availability: 24 pgs inactive
Degraded data redundancy: 24 pgs undersized
too few PGs per OSD (24 < min 30)
services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mgr(active)
mds: cephfs-1/1/1 up {0=ceph-mds=up:creating}
osd: 1 osds: 1 up, 1 in
rbd-mirror: 1 daemon active
data:
pools: 3 pools, 24 pgs
objects: 0 objects, 0 B
usage: 2.2 GiB used, 7.4 GiB / 9.6 GiB avail
pgs: 100.000% pgs not active
24 undersized+peered
root@ubuntu-bionic:/home/vagrant# docker exec ceph-osd ceph -s
cluster:
id: 0e6a5ae3-1678-494f-9dcc-939c50473107
health: HEALTH_WARN
1 MDSs report slow metadata IOs
Reduced data availability: 24 pgs inactive
Degraded data redundancy: 24 pgs undersized
too few PGs per OSD (24 < min 30)
services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mgr(active)
mds: cephfs-1/1/1 up {0=ceph-mds=up:creating}
osd: 1 osds: 1 up, 1 in
rbd-mirror: 1 daemon active
data:
pools: 3 pools, 24 pgs
objects: 0 objects, 0 B
usage: 2.3 GiB used, 7.4 GiB / 9.6 GiB avail
pgs: 100.000% pgs not active
24 undersized+peered
Just to make sure that the results are repeatable, all the above procedures were also executed in two different vagrant machines. The results did not change. The errors received are as given above.
I tried the following vagrant boxes.
vagrant init ubuntu/bionic64
vagrant init ubuntu/trusty64
vagrant init fedora/28-cloud-base --box-version 20180425
The ceph engineers recommend that we use ceph-ansible for easy installation. I tried their suggestion. The commands used are:
sudo add-apt-repository ppa:ansible/ansible-2.6
sudo apt-get update
sudo apt-get install ansible
sudo apt-get install python-pip
sudo pip install notario
sudo apt-get install systemd
git clone https://github.com/ceph/ceph-ansible.git -b master #supports Ansible v2.6
cd ceph-ansible
cp site.yml.sample site.yml
ansible-playbook site.yml -i inventory
# inventory file for Ansible
[mons]
localhost
[osds]
localhost
[mdss]
localhost
[rgws]
localhost
[restapis]
localhost
[mgrs]
localhost
[all:vars]
ansible_connection=local
# group_vars/all.yml file
ceph_origin: repository
ceph_repository: community
ceph_stable_release: jewel
public_network: "10.0.2.15/24"
cluster_network: "10.0.2.15/24"
radosgw_address: 10.0.2.15
monitor_interface: eth0
devices:
- '/dev/sda'
osd_scenario: collocated
This fails saying there are no sensible defaults /etc/ceph/ceph.conf. Even after creating a ceph.conf file, we encounter an error saying Ceph Monitor does not deploy correctly.
I even tried installing just Ceph OSD and Ceph RadosGW components. But, both need Ceph Monitor to work. Hence, the Ansible installation fails as well.