background
trip report
suggestions

background

I recently adapted rsdb, my lock-free persistent B+ tree to use a forerunner for the next crossbeam epoch API, coco 0.2.0. I had previously intended to write my own epoch system as a learning exercise, but am currently in a rush to get the first version of the system out the door, and don't want to waste time on that right now. Later, I want to formally verify the lock-free algorithms in use in the system via symbolic execution or a compiler plugin that instruments the code carefully to facilitate exhaustive interleaving exploration, but right now the goal is to get this thing out the door.

I began writing the system after studying the bw tree paper and I knew I'd eventually use an epoch system, but I simply ignored garbage for the first stage of implementation, as it was more important for my own sense of progress to get the "visible" features implemented:

a lock-free log-structured store with reserveableslots
a lock-free pagecache with support for fragmented pages across the log
a lock-free b-link tree

These have been implemented, and now it's time to shift gears to productionization:

eliminating memory leaks
property testing
fault injection
performance tuning
CI tests with lsan, tsan, asan, etc...

Using coco for its epoch garbage collection system has led to a system that runs for hours without any detected memory leaks by lsan. However, lsan found a leak in the crossbeam library, in the process!

trip report

After learning coco's epoch system, I'm quite happy with it. I love that I can use defer_drop to call the actual destructor on an object, which is handy for recursively dropping linked structures.

The ability to run real destructors allows arbitrary resource freeing strategies to exist, including for logical identifiers that don't necessarily correspond to a chunk of memory. In the pagecache that powers the BW tree, logical page identifiers should be reused to minimize fragmentation of the mapping table (powered by a simple lock-free radix tree which can theoretically reach a .92 data:pointer ratio with a dense keyspace or get incredibly bad ratios if it gets fragmented). However, we should wait until all threads have moved beyond the epoch that a logical page was removed during before reintroducing its identifier.

I took advantage of defer_drop by creating a simple structure that pushes the identifier to a free-list (lock-free stack) when dropped.

Understanding Owned, Ptr, and Atomic was pretty intuitive for me, as someone who has made some pretty simple lock-free systems in garbage-collected languages in the past, but getting defer_free and defer_drop to eat all of my unused memory was a more interesting learning experience for me. First, this would have been hopeless without lsan or valgrind, so it may be worthwhile to help newcomers to the system understand the basics of these tools. I ran into a number of issues running the leak sanitizer, like having trouble getting it to provide symbols to me, and I found that tsan simply stopped working on linux after the July 6 nightly build, but eventually I got these vital tools to work by running them via rustup run as root (this probably works better on osx):

RUSTFLAGS"-Z sanitizer=leak" sudo rustup run \
  nightly-2017-07-06-x86_64-unknown-linux-gnu \
  cargo test \
  --target x86_64-unknown-linux-gnu

Finally able to gain insight into my leaks, I spent about a day and a half going through them and hunting down. One trick I realized early on, is if I'm not sure where something is, run both a long and short burn-in test and see if the number of leaked objects varies significantly. If not, knowing it's more or less a one-time cost can be a huge clue in finding where something should be dropped.

I hit a few bugs because I initially used defer_free on things, without it being intuitive to me that some key underlying memory would not actually get removed. For example, in a lock-free stack that I have, in a method with try-once append semantics with a provided key, if the CAS fails to append the container node, the method should just free the memory. Owned does not have Drop defined on it, so I need to use defer_drop even though nothing else has witnessed it. I initially tried to use defer_free, assuming that it would free that node and the inner object that it stores without running the destructor, but this led to a memory leak where the inner object actually wasn't freed (it's not a pointer, it's wholy owned in this implementation). Instead, I needed to set the next pointer to null and use defer_drop on the node. I still don't quite understand this, but I'm happy I found it and it doesn't leak now.

suggestions

Overall I'm thrilled with the library, and I was able to work around things that were a little tricky for me in all cases. I think most of what can be done is in documentation at this point, to help people understand the little things that trip people up. These suggestions are coming from a relative beginner, and there may be simple reasons why some of these things should not be done, but here are some things that I may look into:

document getting lsan to run on linux
more thoroughly document the difference between defer_drop and defer_free and why free could fail to deallocate things.
having a consuming method on Owned that destroys the inner data without putting pressure on the epoch system / extra code to turn it into a ptr then defer_drop. There are a few places I could have used compare_and_swap_owned if there was a simpler way to drop the Owned if it didn't work, but because I needed to use Ptr for the drop case anyway, I ignored the *_owned methods in many cases.

spacejam/coco_trip_report.md

background

trip report

suggestions