-
A drop in replacement for AWS dynamodb https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html
-
Announced last year, in scylla open source https://www.scylladb.com/2019/09/11/scylla-alternator-the-open-source-dynamodb-compatible-api/
during the development Nadav wrote py.test based test and using boto3 (the python aws python client) testing against one scylla server. https://github.com/scylladb/scylla/tree/master/test/alternator
- There were test from almost day one of the development,
- those are sitting next to the scylla code
- there was no automation/CI for then, until a very late stage
- only single server was test, no topology/clustering were cover by the unittests
During the announcement we start working on a simple longevity test and the main challenge was finding and adapting a stress tool that can work with dyanmodb api YCSB was selected, an 3h test was setup, with a minimal collection of stats. dev team was using it a bit, mainly to produce screenshot of the monitor, and the new alternator dashboards. only when starting to working on 4.0, we actually start expanding this an utilizing YCSB better
Now we have in SCT:
- 3h basic scenario (equivalent to 4h longevity)
- 48h longevity with authentication (equivalent to 48h longevity)
- performance benchmark - throughput and latency
- multi-region longevity - still WIP
- We start early on with SCT, which uncover import issues, regarding cluster and replication factor, which isn't covered in the unittests at all.
- YCSB was proven very helpful tool, even that it took a while to figure how to enable data integrity checks. we now support both CQL an dynamodb with it
- We introduced the docker based loader, which open lots of possibilities for SCT
- helped creating a new report that compare multiple types stress (subtests) from the same run (mainly alex work for CDC)
- Dynamodb client are not cluster aware, which was made using nemesis a bit of a pain. in the end we are using the "DNS" solution, in scylla-cloud a load balancer would be used.
- Creating the performance benchmark, was very complex, it has lots of moving part to get it working correctly
- how store the cassandra-stress information, and how we retrieve it, is hidden deep in SCT code
- need a ability to suppress events, since LWT was causing error prints on high throughput
- had to run each case 3 times, with cql, without lwt, and with lwt.
Very close to the release of 4.0, we start pushing for having tests in dtest assuming lots of the functionally is covered in the unittest and the millage we had with SCT in a ~3wk effort we have 14 test, that are all utilizing 3 or more nodes, running nodetool commands, adding/decommissioning nodes.
- writing the first test was quite easy, since it's only one configuration flag
and boto3 already introduced by Shlomo for manager backup
(only once small change need for ccm for supplying the
alternator_address)
- since we are not very used to working on "close quarters", it was challenging not to break each other code all the time.
- since we were not doing the test as the development was done, we weren't very specific in our testing, hopefully in next step we'll be able to closely follow the alternator development