A list of tasks that need to happen to ensure we get consistent results out of each dataload that can be easily compared to each other.
We should try to bring the system into the same state for each dataload/benchmark. The simplest way is to just wipe everything and start fresh.
- Drop the
oaekeyspace. - Shutdown each Cassandra node
- Clear data from
/var/lib/cassandra/data/oae/var/lib/cassandra/commitlogs(Assuming only theoaekeyspace is used on these nodes)/var/log/cassandra/system.log
nodetool cleanupto wipe unnecessary files.
- Shutdown each app server
- Pull latest master (potential for configuring which branch should be pulled? ex: run tests against simong/rediscache)
- Remove
server.log - Restart app server
- Create a tenant (only needs to happen once)
- Remove the nginx logs.
- Remove old scripts
- Pull latest master (configurable branch?)
This task will perform the data load.
-
Generate a data set
With the following configurable options (we could also generate a dataset once and re-use it for every dataload. That way the results might be more comparable?):
- nr of batches
- users per batch
- groups per batch
- content per batch
-
Push
load startannotation to circonus -
Load the dataset with config
- url
- number of concurrent batches (probably shouldn't change over dataloads)
-
Push
load endannotation to circonus -
Copy the generated html statistics to
/var/www/html/<date>/dataload -
Package the dataset into csv/format files
- The package.js script has to provide the correct .format files along with the CSV.
This task will run the tsung suite.
- Generate a tsung suite (probably just use standard)
- Push
tsung startannotation to circonus - Run tsung tests
- Push
tsung endannotation to circonus - Copy the generated tsung statistics to
/var/www/html/<date>/tsung
Sounds about right. For the app server nodes you could narrow this down to:
#3 will re-checkout git master, regenerate the config.js and start up the node service