Goal

The goal here is to run multisite tests on a single node at scale (assuming the node has enough capacity). This is done using vstart/mstart so the development-test cycle is faster.

Prerequisites

Clone the Ceph repo and build it from source. To get good performance (as well as short build time), use the following cmake command:

cmake -DBOOST_J=$(nproc) -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_MGR_DASHBOARD_FRONTEND=OFF -DWITH_SEASTAR=OFF \
-DWITH_DPDK=OFF -DWITH_SPDK=OFF -DWITH_CEPHFS=OFF -DWITH_RBD=OFF -DWITH_KRBD=OFF -DWITH_CCACHE=OFF \
-DWITH_MANPAGE=OFF -DWITH_LTTNG=OFF -DWITH_BABELTRACE=OFF -DWITH_SYSTEM_BOOST=OFF -DWITH_BOOST_VALGRIND=ON \
-DALLOCATOR=tcmalloc -DCMAKE_BUILD_TYPE=RelWithDebInfo ..

Make sure that RGW is built from one of the NVMEs, this is so that when running, it will use the NVME as local storage for logs.
- build a filesystem:
```
sudo mkfs -t ext4 /dev/nvme0n1
```
- mount it:
```
sudo mount -t auto /dev/nvme0n1 /path/to/ceph/build
```
- chown the directory
Install hsbench

Setup

Start 2 zones with 2 RGWs per zone. Make sure that each zone uses its own set of NVME devices for maximum throughput. This is done by using the test multisite script. For example, with this machine setup:

$ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 893.8G  0 disk 
└─sda1        8:1    0 893.8G  0 part /
nvme0n1     259:0    0   1.5T  0 disk 
└─nvme0n1p1 259:9    0   1.5T  0 part /mnt/nvme0n1p1
nvme1n1     259:1    0   1.5T  0 disk 
nvme2n1     259:2    0   1.5T  0 disk 
nvme4n1     259:3    0   1.5T  0 disk 
nvme5n1     259:4    0   1.5T  0 disk 
nvme6n1     259:5    0   1.5T  0 disk 
nvme3n1     259:6    0   1.5T  0 disk 
nvme7n1     259:7    0   1.5T  0 disk

We would run:

$ sudo MON=1 OSD=1 MDS=0 MGR=0 DEV_LIST=/dev/nvme7n1,/dev/nvme6n1 DB_DEV_LIST=/dev/nvme5n1,/dev/nvme4n1 WAL_DEV_LIST=/dev/nvme3n1,/dev/nvme2n1 RGW_PER_ZONE=2 ../src/test/rgw/test-rgw-multisite.sh 2 --rgw_data_notify_interval_msec=0

So that:

nvme7 and nvme6 are used to store data of zones 1 and 2.
nvme5 and nvme4 are for the DBs of zones 1 and 2.
nvme3 and nvme2 are for the WAL of zones 1 and 2.
nvme0 is where the logs and other things are written

Test

Create the bucket:

$ hsbench -a 1234567890 -s pencil -u http://localhost:8101 -bp my-bucket -b 1 -r zg1 -m i

Prime the bucket in both zones with some objects, so that the test is doing only incremental sync:

$ hsbench -a 1234567890 -s pencil -u http://localhost:8101 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 5 -op obj1
$ hsbench -a 1234567890 -s pencil -u http://localhost:8201 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 5 -op obj2

Wait for full sync to end:

$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket sync checkpoint --bucket my-bucket000000000000
$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket sync checkpoint --bucket my-bucket000000000000

Run the following commands in parallel (in different terminals) for the actual test. So that different objects are uploaded via the 4 RGWs.

$ hsbench -a 1234567890 -s pencil -u http://localhost:8101 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 1800 -op obj3

$ hsbench -a 1234567890 -s pencil -u http://localhost:8201 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 1800 -op obj4

$ hsbench -a 1234567890 -s pencil -u http://localhost:8102 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 1800 -op obj5

$ hsbench -a 1234567890 -s pencil -u http://localhost:8202 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 1800 -op obj6

The tests should end in ~30 minutes, and then we should wait for syncing end. Once syncing ends we should make sure that the bucket on both zones has the same number of objects.

Results

Wait for sync to finish - this is blocking, so run in different terminals:

$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket sync checkpoint --bucket my-bucket000000000000

$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket sync checkpoint --bucket my-bucket000000000000

Check that sync process finished:

$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket sync status --bucket my-bucket000000000000
$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket sync status --bucket my-bucket000000000000

To make sure we are syncing the right generation ,use:

$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket layout --bucket my-bucket000000000000
$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket layout --bucket my-bucket000000000000

Check the same number of objects using "bucket stats":

$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket stats --bucket my-bucket000000000000
$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket stats --bucket my-bucket000000000000

Check the same number of objects using S3 bucket listing. Note that it may take a long while even when using a fast tool like s5cmd:

$ AWS_ACCESS_KEY_ID=1234567890 AWS_SECRET_ACCESS_KEY=pencil s5cmd --endpoint-url http://localhost:8101 ls s3://my-bucket000000000000 > s3_objects1
$ AWS_ACCESS_KEY_ID=1234567890 AWS_SECRET_ACCESS_KEY=pencil s5cmd --endpoint-url http://localhost:8201 ls s3://my-bucket000000000000 > s3_objects2

Check the same number of RADOS objects. we have just one bucket, and objects are small, so following should work:

$ sudo bin/rados -c run/c1/ceph.conf ls --pool zg1-1.rgw.buckets.data > rados_objects1
$ sudo bin/rados -c run/c2/ceph.conf ls --pool zg1-2.rgw.buckets.data > rados_objects2

yuvalif/multisite-on-single-node.md

Select an option

No results found