Skip to content

Instantly share code, notes, and snippets.

@jlouis
Created October 2, 2014 13:11
Show Gist options
  • Save jlouis/11400aaeece1bc7e9534 to your computer and use it in GitHub Desktop.
Save jlouis/11400aaeece1bc7e9534 to your computer and use it in GitHub Desktop.
Some feedback for Chris Meiklejohn (and others)

Feedback on the QuickCheck model

So, we have a QuickCheck model at

https://github.com/cmeiklejohn/derflow/blob/master/test/derflow_ets_eqc.erl

and Chris is asking for feedback, so here goes the things I've seen in this model. This is just haphazardly written, so don't expect there to be much flow in the thing.

Grouped commands

First of all, have you looked at the "grouped_commands" way of writing models? I find it vastly more simple to understand. What it does is to transpose the model so everything pertaining to one command is around that command. For the command declare you would write the functions:

declare_pre/1
declare_args/1
declare_post/3
declare_next/3

and QC will automatically turn these into the same thing as your test cases. It provides for a nicer flow of the code. Also note that you have declare_pre/2 which can be used to remove arguments you don't want. The way QC generates commands is then:

  • Given state S, use declare_pre(S) to determine if this command can fire in the state.
  • If it can fire, use declare_args(S) to generate arguments for the command.
  • Check that the commands satisfy the precondition by calling declare_pre(S, Args).

Of course, all of these functions are optional and leaving them out will skip that test. foo_args/1 is needed however for the system to understand that there is such a command.

A full example of grouped commands for a simple model:

https://github.com/jlouis/dht_bt/blob/master/eqc_test/dht_routing_table_eqc.erl

State modeling

I would represent the store as [{Key, Value}] rather than a dict(). There is no performance worry here. And if there is an error, shrinking will easily get us to a point where there are a few elements in the list only. So it should be easier to manage. It makes it much easier to read the state once something is going wrong. Optimize for the human being.

Consider testing for a time

I prefer using eqc:testing_time(Secs, Prop) because I then get tests based on the power of the machine and I can avoid the time traps, rather than a count of num_tests.

Consider using aggregations

You need a way to scrutinize the distribution of your tests. To do that, add aggregate/2 to your test cases. A simple example is in Isaiah Peng's transit-erlang repository, where we developed a model for it:

https://github.com/isaiah/transit-erlang/blob/8c7f72a28a3dc2a6122d6d599cc4e69ab44ef3d5/eqc_test/transit_eqc.erl#L202

The idea here is to build two aggregates. One measures the size of the built transit terms, whereas the other one builds a list of the types that were produced. In the derflow model, it is not clear to me how a new value is generated, so checking that the distribution matches what you want is important. I use a log10 factoring so I can measure easily if we have more than 1, 10, and 100 elements, which are the typical sizes chosen for these tests in EQC.

Odd things:

  • Why the noshrink/1 in the generated commands?
  • Why read/write concurrency true? This is not really a performance game :)
  • I would probably seek to build a language for the manipulations of the state and use this language in the exposition itself. It allows you to easily add or change the internal representation without having to go through all of the model to change things that doesn't work like they should.

Future things:

Consider using requirements testing. An example of it is in jlouis/fuse:

https://github.com/jlouis/fuse/blob/master/eqc_test/fuse_eqc.erl

The idea is that the reqs field in the #state{} contains a list of the requirements we hit. Adding a requirement is just adding to the list as if it were a set:

req(Req,S) ->
  S#state{reqs = lists:usort([Req|S#state.reqs])}.

With this in place, we simply just track a requirement, as:

req("R01 - Inserted an element", S).

And finally, we add an aggregate for the #state.reqs field so we can see how often requirements are hit in the test cases. This again makes sure our model has good coverage.

Also, consider adding negative testing. I love doing invalid things to a system and see how it behaves given the invalid input. It can be used to gauge the robustness of the built system by verifying it also rejects wrong input in a nice way. It usually comes later in the development of the model.

General ramblings :)

QuickCheck models are constructed as layers. You start off with a subset you can model and then you learn from that subset in order to figure out how to model more of the system. Don't be afraid to "skip" parts which are hard to model in the beginning. Often you can come back later when you know exactly which property or behaviour you wanted from that part.

Speaking of behaviours, that is a good way to look at QC modeling. Sometimes you have a very specific mathematical property: "This is a group!" or "This is a lattice!". But more often, part of your system will have no such thing. Rather, it will be behavioural: "if I do X, then Y will happen". The system can then be described as a set of observations about the system based on the inputs you give. Hopefully the behaviours are separable and can be composed. Because this means you can construct an EQC model of each behaviour and have command generation compose the behaviours.

Modeling behaviour is something the rather recent eqc_component system is very good at. But don't go there unless you have the time :) I have some examples of this I am working on currently, but it is not finished yet. The basic idea of a component is that you isolate part of your system and mock the remaining parts. This means you can describe behaviour in the isolated part without having to account for underlying details. Which in turn lets your vary your test cases more.

Example: We have a low level routing table in a DHT. A high level gen_server controls the routing table. First, we write a model for the low-level routing table and verify it correct. Then we go verify the high-level gen server with the routing table mocked. When we mock, the routing table is simply the atom tbl_ref and whenever it is invoked, we fake the response by consulting our QC model. This means we can isolate the high-level behaviour without having to account for low level routing table details.

I am currently working on this at

https://github.com/jlouis/dht_bt/tree/master/eqc_test

(moving quickly, so things might change, current commit is 27f51f09b3726f57cf7ba97).

Here, the dht_routing_table_eqc models the low-level routing table and dht_state_eqc models the high-level approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment