Skip to content

Instantly share code, notes, and snippets.

@yuvalif
Last active August 4, 2022 14:10
Show Gist options
  • Select an option

  • Save yuvalif/382e37393bfd8889430f4f2f7421c04c to your computer and use it in GitHub Desktop.

Select an option

Save yuvalif/382e37393bfd8889430f4f2f7421c04c to your computer and use it in GitHub Desktop.

Goal

The goal here is to run multisite tests on a single node at scale (assuming the node has enough capacity). This is done using vstart/mstart so the development-test cycle is faster.

Prerequisites

cmake -DBOOST_J=$(nproc) -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_MGR_DASHBOARD_FRONTEND=OFF -DWITH_SEASTAR=OFF \
-DWITH_DPDK=OFF -DWITH_SPDK=OFF -DWITH_CEPHFS=OFF -DWITH_RBD=OFF -DWITH_KRBD=OFF -DWITH_CCACHE=OFF \
-DWITH_MANPAGE=OFF -DWITH_LTTNG=OFF -DWITH_BABELTRACE=OFF -DWITH_SYSTEM_BOOST=OFF -DWITH_BOOST_VALGRIND=ON \
-DALLOCATOR=tcmalloc -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
  • Make sure that RGW is built from one of the NVMEs, this is so that when running, it will use the NVME as local storage for logs.
    • build a filesystem:
    sudo mkfs -t ext4 /dev/nvme0n1
    
    • mount it:
    sudo mount -t auto /dev/nvme0n1 /path/to/ceph/build
    
    • chown the directory
  • Install hsbench

Setup

Start 2 zones with 2 RGWs per zone. Make sure that each zone uses its own set of NVME devices for maximum throughput. This is done by using the test multisite script. For example, with this machine setup:

$ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 893.8G  0 disk 
└─sda1        8:1    0 893.8G  0 part /
nvme0n1     259:0    0   1.5T  0 disk 
└─nvme0n1p1 259:9    0   1.5T  0 part /mnt/nvme0n1p1
nvme1n1     259:1    0   1.5T  0 disk 
nvme2n1     259:2    0   1.5T  0 disk 
nvme4n1     259:3    0   1.5T  0 disk 
nvme5n1     259:4    0   1.5T  0 disk 
nvme6n1     259:5    0   1.5T  0 disk 
nvme3n1     259:6    0   1.5T  0 disk 
nvme7n1     259:7    0   1.5T  0 disk 

We would run:

$ sudo MON=1 OSD=1 MDS=0 MGR=0 DEV_LIST=/dev/nvme7n1,/dev/nvme6n1 DB_DEV_LIST=/dev/nvme5n1,/dev/nvme4n1 WAL_DEV_LIST=/dev/nvme3n1,/dev/nvme2n1 RGW_PER_ZONE=2 ../src/test/rgw/test-rgw-multisite.sh 2 --rgw_data_notify_interval_msec=0

So that:

  • nvme7 and nvme6 are used to store data of zones 1 and 2.
  • nvme5 and nvme4 are for the DBs of zones 1 and 2.
  • nvme3 and nvme2 are for the WAL of zones 1 and 2.
  • nvme0 is where the logs and other things are written

Test

  • Create the bucket:
$ hsbench -a 1234567890 -s pencil -u http://localhost:8101 -bp my-bucket -b 1 -r zg1 -m i
  • Prime the bucket in both zones with some objects, so that the test is doing only incremental sync:
$ hsbench -a 1234567890 -s pencil -u http://localhost:8101 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 5 -op obj1
$ hsbench -a 1234567890 -s pencil -u http://localhost:8201 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 5 -op obj2
  • Wait for full sync to end:
$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket sync checkpoint --bucket my-bucket000000000000
$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket sync checkpoint --bucket my-bucket000000000000
  • Run the following commands in parallel (in different terminals) for the actual test. So that different objects are uploaded via the 4 RGWs.
$ hsbench -a 1234567890 -s pencil -u http://localhost:8101 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 1800 -op obj3
$ hsbench -a 1234567890 -s pencil -u http://localhost:8201 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 1800 -op obj4
$ hsbench -a 1234567890 -s pencil -u http://localhost:8102 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 1800 -op obj5
$ hsbench -a 1234567890 -s pencil -u http://localhost:8202 -bp my-bucket -t 8 -b 1 -r zg1 -m p -z 4K -d 1800 -op obj6

The tests should end in ~30 minutes, and then we should wait for syncing end. Once syncing ends we should make sure that the bucket on both zones has the same number of objects.

Results

Wait for sync to finish - this is blocking, so run in different terminals:

$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket sync checkpoint --bucket my-bucket000000000000
$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket sync checkpoint --bucket my-bucket000000000000

Check that sync process finished:

$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket sync status --bucket my-bucket000000000000
$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket sync status --bucket my-bucket000000000000

To make sure we are syncing the right generation ,use:

$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket layout --bucket my-bucket000000000000
$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket layout --bucket my-bucket000000000000

Check the same number of objects using "bucket stats":

$ sudo bin/radosgw-admin -c run/c1/ceph.conf bucket stats --bucket my-bucket000000000000
$ sudo bin/radosgw-admin -c run/c2/ceph.conf bucket stats --bucket my-bucket000000000000

Check the same number of objects using S3 bucket listing. Note that it may take a long while even when using a fast tool like s5cmd:

$ AWS_ACCESS_KEY_ID=1234567890 AWS_SECRET_ACCESS_KEY=pencil s5cmd --endpoint-url http://localhost:8101 ls s3://my-bucket000000000000 > s3_objects1
$ AWS_ACCESS_KEY_ID=1234567890 AWS_SECRET_ACCESS_KEY=pencil s5cmd --endpoint-url http://localhost:8201 ls s3://my-bucket000000000000 > s3_objects2

Check the same number of RADOS objects. we have just one bucket, and objects are small, so following should work:

$ sudo bin/rados -c run/c1/ceph.conf ls --pool zg1-1.rgw.buckets.data > rados_objects1
$ sudo bin/rados -c run/c2/ceph.conf ls --pool zg1-2.rgw.buckets.data > rados_objects2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment