This document add more details on the GSoC22 project "Telescópio Lua".
Ceph is a distributed storage system that supports: block, file, and object storage. All types of storage use the RADOS backend storage system. S3 compliant object storage is provided by the Object Gateway (a.k.a. the RADOS Gateway or the RGW).
In this project we should: Expose the payload of the objects being uploaded (PUT) or retrieved (GET) as a stream of bytes to Lua in the RGW.
- The Lua script should be able to read the payload and perform calculation on the payload and use the outcome. Decisions could be made based on it, it would be written to object attributes, logged, or sent to external systems.
- The Lua script should be able to rewrite the payload being uploaded (PUT) or retrieved (GET)
Note that in case of large objects, only part of the payload is exposed to Lua each time a request is handled. Storing data and handling large objects is not in scope here
First would be to have a linux based development environment, as a minimum you would need a 8 CPU machine, with 16G RAM and 50GB disk.
Note that using a machine with lower spec is also possible, but Ceph build time might take several hours
Unless you already have a linux distro you like, I would recommend choosing from:
- Fedora - my favorite (34 or higher)
- Ubuntu (20.04 and up)
- OpenSuse (Leap 15.2 or tumbleweed)
Using WSL on your Windows machine is also possible, but build times would be longer than running native Linux
Once you have that up and running, you should clone the Ceph repo from github (https://github.com/ceph/ceph). If you don’t know what github and git are, this is the right time to close these gaps :-) And yes, you should have a github account, so you can later share your work on the project.
The repo has a readme file with instructions on how to build ceph - just follow these instructions and and build it (depending with the amount of CPUs you have this may take a while).
Our build system is based on cmake - so it is probably a good idea to know a little bit about that.
Assuming the build was completed successfully, you can run the unit tests (see: https://github.com/ceph/ceph#running-unit-tests).
Now you are ready to run the ceph processes, as explained here: https://github.com/ceph/ceph#running-a-test-cluster You probably would also like to check the developer guide (https://docs.ceph.com/docs/master/dev/developer_guide/) and learn more on how to build Ceph and run it locally (https://docs.ceph.com/docs/master/dev/quick_guide/).
Assuming you have everything up and running, you can create a bucket in Ceph and upload an object to it.
Best way for doing that is the s3cmd python command line tool:
https://github.com/s3tools/s3cmd
Note that the tool is mainly geared towards AWS S3, so make sure to specify the location of the RGW as the endpoint, and the RGW credentials (as printed to the screen after running vstart.sh).
For example:
$ s3cmd --host=localhost:8000 --host-bucket="localhost:8000/%(bucket)" \
--access_key=0555b35654ad1656d804 \
--secret_key=h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q== \
mb s3://mybucket
Would create a bucket called mybucket in Ceph.
And:
$ s3cmd --host=localhost:8000 --host-bucket="localhost:8000/%(bucket)" \
--access_key=0555b35654ad1656d804 \
--secret_key=h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q== \
put myimage.jpg s3://mybucket
Would put myimage.jpg into that bucket.
Lua is not a commonly used language, on the other hand, it is very intuitive abd easy to learn. To get started, the best option is to use the free online version of "Programming in Lua" (PIL).
Note that this is a guide for Lua 5.0, while we are using Lua 5.3 - but for the basic stuff it should not matter much.
Reference manual exists for the latest Lua.
More information on learning Lua could be found in the Lua Users Wiki.
Please pick one of the code examples from here to test Lua scripting on the RGW.
Would recommend to try and contribute to Ceph project based on this PR:
- rewrite the Prometheus example to send the counters from the background context instead of on every request
- use Intel's TBB concurrent hash map to replace the current mutex based implementation for the background hash map
Other Lua related contributions could be:
- cover all fields from the doc in the unit tests