All daemons in ceph are using tcmalloc as the memory allocator to achieve better performance. In a recent PR the ability to get information on how tcmalloc performs in the RGW was added. In this project, we should use the profiling information from RGW runs to tune the tcmalloc parameters so that would be more suitable for the memory use of the RGW.
First would be to have a Linux based development environment, as a minimum you would need a 4 CPU machine, with 8G RAM and 50GB disk. Unless you already have a Linux distro you like, I would recommend choosing from:
- Fedora (42/43) - my favorite!
- Ubuntu (24.04 LTS)
- WSL (Windows Subsystem for Linux), though it would probably take much longer...
- RHEL9/Centos9
- Other Linux distros - try at your own risk :-)
Once you have that up and running, you should clone the Ceph repo from github (https://github.com/ceph/ceph). Make sure that you fetch the code from the above PR so that the RGW will have tcmalloc profiling support. If you don't know what github and git are, this is the right time to close these gaps :-) And yes, you should have a github account, so you can later share your work on the project.
Install any missing system dependencies use:
./install-deps.sh
Note that the first build may take a long time, so the following cmake parameter could be used to minimize the build time.
With a fresh ceph clone use the following:
./do_cmake.sh -DBOOST_J=$(nproc) -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_MGR_DASHBOARD_FRONTEND=OFF \
-DWITH_DPDK=OFF -DWITH_SPDK=OFF -DWITH_SEASTAR=OFF -DWITH_CEPHFS=OFF -DWITH_RBD=OFF -DWITH_KRBD=OFF -DWITH_CCACHE=OFF -Gninja
Then invoke the build process (using ninja) from within the build directory (created by do_cmake.sh).
Assuming the build was completed successfully, you can run the unit tests (see: https://github.com/ceph/ceph#running-unit-tests).
Now you are ready to run the ceph processes, as explained here: https://github.com/ceph/ceph#running-a-test-cluster
You probably would also like to check the developer guide (https://docs.ceph.com/docs/master/dev/developer_guide/) and learn more on how to build Ceph and run it locally (https://docs.ceph.com/docs/master/dev/quick_guide/).
- install the awc cli tool
- configure the tool according to the access and secret keys showing in the output of the
vstart.shcommand - start the vtsart cluster:
$ MON=1 OSD=1 MDS=0 MGR=0 RGW=1 ../src/vstart.sh -n -d
- create a bucket:
$ aws --endpoint-url http://localhost:8000 s3 mb s3://fish
- create a file, and upload it:
$ head -c 512 </dev/urandom > myfile
$ aws --endpoint-url http://localhost:8000 s3 cp myfile s3://fish
- list the bucket and make sure the file is there:
$ aws --endpoint-url http://localhost:8000 s3 ls s3://fish
install hsbench, s5cmd or build your own tool (based on the boto3 python library) and try to load the RGW with requests.
Use the tcmalloc profiling tool to get the different usage patterns:
- write only
- combines write + read + delete
- read only
- bucket listing. Note that to get interesting results, you would need ~1M objects in the bucket, which may taks ~hour to fill
- any other combination you find interesting
And for different object sizes:
- small objects: 4K - 256K
- medium objects: 1M - 4M
- large object (with multipart upload): 24M
Feel free to try different combinations of the above, spread across different numbers of buckets.
Note that when running the ceph cluster with vstart the performance numbers will be low (due to the OSD/drive speed). And does not give an indication to the actual performance of the RGW. In the real project we will perform this analysis on faster hardware, but the methodology would be similar