I recently adapted rsdb, my lock-free persistent B+ tree
to use a forerunner for the next crossbeam epoch API, coco 0.2.0
.
I had previously intended to write my own epoch system as a learning
exercise, but am currently in a rush to get the first version of the system out the door,
and don't want to waste time on that right now. Later, I want to formally verify the
lock-free algorithms in use in the system via symbolic execution or a compiler plugin that
instruments the code carefully to facilitate exhaustive interleaving exploration, but
right now the goal is to get this thing out the door.
I began writing the system after studying the bw tree paper and I knew I'd eventually use an epoch system, but I simply ignored garbage for the first stage of implementation, as it was more important for my own sense of progress to get the "visible" features implemented:
- a lock-free log-structured store with reserveableslots
- a lock-free pagecache with support for fragmented pages across the log
- a lock-free b-link tree
These have been implemented, and now it's time to shift gears to productionization:
- eliminating memory leaks
- property testing
- fault injection
- performance tuning
- CI tests with lsan, tsan, asan, etc...
Using coco for its epoch garbage collection system has led to a system that runs for hours without any detected memory leaks by lsan. However, lsan found a leak in the crossbeam library, in the process!
After learning coco's epoch system, I'm quite happy with it. I love that I can use defer_drop
to call the actual destructor on an object, which is handy for recursively dropping
linked structures.
The ability to run real destructors allows arbitrary resource freeing strategies to exist, including for logical identifiers that don't necessarily correspond to a chunk of memory. In the pagecache that powers the BW tree, logical page identifiers should be reused to minimize fragmentation of the mapping table (powered by a simple lock-free radix tree which can theoretically reach a .92 data:pointer ratio with a dense keyspace or get incredibly bad ratios if it gets fragmented). However, we should wait until all threads have moved beyond the epoch that a logical page was removed during before reintroducing its identifier.
I took advantage of defer_drop
by creating a simple structure that pushes
the identifier to a free-list (lock-free stack) when dropped.
Understanding Owned
, Ptr
, and Atomic
was pretty intuitive for me, as someone
who has made some pretty simple lock-free systems in garbage-collected languages
in the past, but getting defer_free
and defer_drop
to eat all of my unused
memory was a more interesting learning experience for me. First, this would have
been hopeless without lsan or valgrind, so it may be worthwhile to help newcomers
to the system understand the basics of these tools. I ran into a number of
issues running the leak sanitizer, like having trouble getting it to
provide symbols to me, and I found that tsan simply stopped working on linux after
the July 6 nightly build, but eventually I got these vital tools to work by running
them via rustup run
as root (this probably works better on osx):
RUSTFLAGS"-Z sanitizer=leak" sudo rustup run \
nightly-2017-07-06-x86_64-unknown-linux-gnu \
cargo test \
--target x86_64-unknown-linux-gnu
Finally able to gain insight into my leaks, I spent about a day and a half going through them and hunting down. One trick I realized early on, is if I'm not sure where something is, run both a long and short burn-in test and see if the number of leaked objects varies significantly. If not, knowing it's more or less a one-time cost can be a huge clue in finding where something should be dropped.
I hit a few bugs because I initially used defer_free
on things, without it being
intuitive to me that some key underlying memory would not actually get removed. For
example, in a lock-free stack that I have, in a method with try-once append
semantics with a provided key, if the CAS fails to append the container node, the
method should just free the memory. Owned does not have Drop
defined on it,
so I need to use defer_drop
even though nothing else has witnessed it. I initially
tried to use defer_free
, assuming that it would free that node and the inner
object that it stores without running the destructor, but this led to a memory
leak where the inner object actually wasn't freed (it's not a pointer, it's wholy
owned in this implementation). Instead, I needed to set the next pointer to null and
use defer_drop
on the node.
I still don't quite understand this, but I'm happy I found it and it doesn't leak now.
Overall I'm thrilled with the library, and I was able to work around things that were a little tricky for me in all cases. I think most of what can be done is in documentation at this point, to help people understand the little things that trip people up. These suggestions are coming from a relative beginner, and there may be simple reasons why some of these things should not be done, but here are some things that I may look into:
- document getting lsan to run on linux
- more thoroughly document the difference between
defer_drop
anddefer_free
and why free could fail to deallocate things. - having a consuming method on
Owned
that destroys the inner data without putting pressure on the epoch system / extra code to turn it into a ptr thendefer_drop
. There are a few places I could have usedcompare_and_swap_owned
if there was a simpler way to drop theOwned
if it didn't work, but because I needed to usePtr
for the drop case anyway, I ignored the*_owned
methods in many cases.