Last active
May 21, 2021 20:23
-
-
Save jeremyong/6f8c37ef130babf862c8ac5cd444125a to your computer and use it in GitHub Desktop.
Crowdsourcing review of lock free initialization/deinitialization routine
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Suppose we have two functions load() and unload() that take a finite amount of time to complete and are not threadsafe. | |
// They also cannot run at the same time. It is invalid to call load before a pending unload is finished, and it is invalid | |
// to call unload before a previous call to load has finished. Basically, we are either in a "fully loaded" state, or a | |
// "fully unloaded" state. | |
void load(); | |
void unload(); | |
// Furthermore, suppose we have two functions, start() and finish(). The first time we call start, load must | |
// run to completion before entering the body of start(). After the final call to finish, unload should be called unless | |
// another thread happens to call load early. | |
void start(); | |
void finish(); | |
// The question is, how can and implement the prologue of the start function and the epilogue of the finish function to | |
// perform load and unload properly in a lockfree manner? Here's an initial stab at the solution. Can we prove it's correct? | |
#include <atomic> | |
#include <thread> | |
// Bit layout of the init state atomic | |
struct load_state_t | |
{ | |
uint32_t unloading : 1; | |
uint32_t loaded : 1; | |
uint32_t ref_count : 30; | |
}; | |
static std::atomic<uint32_t> load_state = 0; | |
bool loaded(uint32_t state) { return (state & (1ul << 30)) != 0; } | |
bool unloading(uint32_t state) { return (state & (1ul << 31)) != 0; } | |
uint32_t ref_count(uint32_t state) { return (state & ((1ul << 30) - 1)); } | |
void start() { | |
uint32_t prior = load_state++; | |
if (ref_count(prior) == 0) { | |
// We're responsible for ensuring we're in a "loaded" state | |
// First, spin until we are no longer unloading if we are | |
while (unloading(prior)) { | |
// yield to let threads needed in the unload function to make progress | |
std::this_thread::yield(); | |
prior = load_state.load(std::memory_order_acquire); | |
} | |
// If loaded is false, proceed with loading. The loaded bit could still be set because the finish function may not have called "unload" | |
// if the ref count incremented immediately after the finish function decremented the ref count to 0. | |
if (!loaded(prior)) | |
{ | |
load(); | |
// Atomically set the loaded state (preserving current ref count) | |
// NOTE: notice it's impossible for the ref count here to dip to 0 since this function's caller is still in the body of "start" | |
load_state |= (1 << 30); | |
} | |
} else { | |
// The prior ref count was > 0 so we just need to wait until the state is loaded | |
while (!loaded(prior)) { | |
// yield to let threads needed in the load function to make progress | |
std::this_thread::yield(); | |
prior = load_state.load(std::memory_order_acquire); | |
} | |
} | |
// Rest of the start function | |
} | |
void finish() { | |
// Body of the finish function | |
uint32_t prior = load_state--; | |
if (ref_count(prior) == 1) { | |
// We're responsible for unloading IFF another thread didn't increment the ref count in the meantime | |
// Attempt ONCE to set the load_state to the unloading state with a ref count of zero. Failure to do | |
// so means someone else called start in the meantime | |
if (load_state.compare_exchange_strong(prior - 1, (1 << 31))) { | |
unload(); | |
// Unset the unloading state | |
load_state &= ~(1 << 31); | |
} | |
} | |
} |
I believe for itanium or non-memory coherent architectures, lines 57 and 84 can be replaced with std::memory_order_release
-ordered fetch_or
and fetch_and
respectively.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Suppose for discussion sake that we can assume that callers will not call
finish
more times than they will callstart
and that every occurrence offinish
is preceded causally by at least one call tostart
. Furthermore, suppose we can ignore overflow of theref_count
bits.