Skip to content

Instantly share code, notes, and snippets.

@fruch
Created May 18, 2020 19:20
Show Gist options
  • Select an option

  • Save fruch/9a649ce59f882ff24b20e28c9ce10aea to your computer and use it in GitHub Desktop.

Select an option

Save fruch/9a649ce59f882ff24b20e28c9ce10aea to your computer and use it in GitHub Desktop.

Testing Alternator

What is alternator

Testing

unit tests

during the development Nadav wrote py.test based test and using boto3 (the python aws python client) testing against one scylla server. https://github.com/scylladb/scylla/tree/master/test/alternator

Highlights
  • There were test from almost day one of the development,
  • those are sitting next to the scylla code
Lowlights
  • there was no automation/CI for then, until a very late stage
  • only single server was test, no topology/clustering were cover by the unittests

scylla-cluster-test

During the announcement we start working on a simple longevity test and the main challenge was finding and adapting a stress tool that can work with dyanmodb api YCSB was selected, an 3h test was setup, with a minimal collection of stats. dev team was using it a bit, mainly to produce screenshot of the monitor, and the new alternator dashboards. only when starting to working on 4.0, we actually start expanding this an utilizing YCSB better

Now we have in SCT:

  • 3h basic scenario (equivalent to 4h longevity)
  • 48h longevity with authentication (equivalent to 48h longevity)
  • performance benchmark - throughput and latency
  • multi-region longevity - still WIP

Highlights

  • We start early on with SCT, which uncover import issues, regarding cluster and replication factor, which isn't covered in the unittests at all.
  • YCSB was proven very helpful tool, even that it took a while to figure how to enable data integrity checks. we now support both CQL an dynamodb with it
  • We introduced the docker based loader, which open lots of possibilities for SCT
  • helped creating a new report that compare multiple types stress (subtests) from the same run (mainly alex work for CDC)

Lowlights

  • Dynamodb client are not cluster aware, which was made using nemesis a bit of a pain. in the end we are using the "DNS" solution, in scylla-cloud a load balancer would be used.
  • Creating the performance benchmark, was very complex, it has lots of moving part to get it working correctly
    • how store the cassandra-stress information, and how we retrieve it, is hidden deep in SCT code
    • need a ability to suppress events, since LWT was causing error prints on high throughput
    • had to run each case 3 times, with cql, without lwt, and with lwt.

dtest

Very close to the release of 4.0, we start pushing for having tests in dtest assuming lots of the functionally is covered in the unittest and the millage we had with SCT in a ~3wk effort we have 14 test, that are all utilizing 3 or more nodes, running nodetool commands, adding/decommissioning nodes.

Highlights

  • writing the first test was quite easy, since it's only one configuration flag and boto3 already introduced by Shlomo for manager backup (only once small change need for ccm for supplying the alternator_address)

Lowlights

  • since we are not very used to working on "close quarters", it was challenging not to break each other code all the time.
  • since we were not doing the test as the development was done, we weren't very specific in our testing, hopefully in next step we'll be able to closely follow the alternator development
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment