Created
September 21, 2016 01:23
-
-
Save jimblandy/ab80fd54029fa5302fa8807e270ce03d to your computer and use it in GitHub Desktop.
Header file from Minor Scheme for memory barriers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* barriers.h --- interface to memory and code translation barriers */ | |
#ifndef MN__GC_BARRIERS_H | |
#define MN__GC_BARRIERS_H | |
/* The following two functions provide the memory-coherence behavior | |
of mutexes, but without any blocking or communication. | |
On many modern multi-threaded systems, one thread may see writes to | |
memory by another thread occur in an order different than the one | |
they were written in. For example, suppose 'x' and 'y' are global | |
variables, both initially zero, shared between two threads A and B. | |
Suppose thread A executes the code: | |
x = 1; | |
y = 2; | |
while thread B executes the code: | |
y1 = y; | |
x1 = x; | |
Everyone understands that, since this code doesn't lock or unlock | |
any mutexes, or use other synchronization mechanisms, the two | |
threads' actions could be interleaved arbitrarily. For example, we | |
have no idea whether B will read x before or after A sets it to 1. | |
But what may be surprising is that reads and writes may not even | |
occur in the order in which they're written. For example, suppose | |
thread B ends up with y1 set to two: | |
- Since y1 == 2, thread A had clearly executed the 'y = 2' | |
assignment before thread B executed 'y1 = y'. | |
- Thread A should have executed 'y = 2' after 'x = 1', since | |
that's the order in which they're written. | |
- Thread B should have executed 'x1 = x' after 'y1 = y', since | |
that's the order in which they're written. | |
- Pulling all that together, we get the ordering: | |
'x = 1'; then 'y = 2'; then 'y1 = y'; then 'x1 = x'. | |
So x1 must be 1. | |
But on many modern systems, x1 may end up set to zero! Clearly, | |
things are not actually happening in the order we wrote them in. | |
Who is at fault? Everybody. | |
First, the compiler is free to rearrange most statements if doing | |
so wouldn't violate any apparent data dependency. The compiler | |
looking at A's code sees no relationship between the assignment to | |
x and the assignment to y, so it can do them in whatever order it | |
finds convenient. The only restrictions on the compiler's freedom | |
are: | |
- It must not re-order, delete, or add system calls. (And by | |
extension, it must not reorder calls to functions that it can't | |
prove won't perform system calls, etc. etc.) | |
- It must not re-order, delete, or add reads or writes of volatile | |
variables across sequence points (function calls; comma | |
operators; etc.) | |
But the code doesn't make any system calls, and x and y aren't | |
volatile, so the compiler can do as it pleases. | |
Another way to put this is that the compiler's reordering need only | |
respect data dependencies from a *single thread's* execution of the | |
code: it need not consider dependencies that arise from multiple | |
threads' interleaved execution. Our example shows that having | |
multiple threads running at a time can reveal rearrangements of | |
code that would be invisible in a single-threaded program. | |
Similar cases can arise with signal handlers that consult global | |
variables: the compiler has the right to exchange the order of two | |
stores, even though a signal handler might catch those stores | |
happening in the wrong order. | |
Second, each processor in an SMP system has an analogous freedom to | |
reorder loads and stores to the memory it shares with the other | |
processors, if doing so wouldn't break the code that processor is | |
running. | |
For example, suppose a program loads the value of some byte from | |
shared memory into a register. A processor might actually read and | |
cache the full block of 64 bytes (or whatever) containing that | |
referenced byte, and then answer loads of other bytes in that block | |
directly from its cache, without going to shared memory again. | |
From the shared memory's point of view, that processor is doing | |
those subsequent loads earlier than they are written in the | |
program: the processor has moved the loads earlier. | |
Or, going the other direction, a processor might do a store only to | |
its cache, and then write the modified words in that cache block | |
out to shared memory at some later time. But it might not do the | |
same to other references. From the shared memory's point of view, | |
the processor has moved that store after other memory references. | |
The point is, as long as the code running on that processor sees | |
all its loads and stores taking effect in the right order *from its | |
point of view*, the processor has met its obligations. If two | |
different processors need to see each other's stores in any | |
particular order, the code must include some sort of | |
architecture-specific memory synchronization instructions. To | |
extend the analogy between processors and compilers, these | |
synchronization instructions are the machine-language equivalent of | |
'volatile' declarations. | |
This freedom to rearrange is a good thing --- it allows the | |
compiler to generate much better code, and it allows the processor | |
to run much faster than it could otherwise, at the cost of | |
complicating the (hopefully) rare cases: signal handlers and | |
inter-thread communication. But it does make writing and reasoning | |
about multi-threaded programs harder. If your threads were only | |
allowed to communicate by assigning to variables, the guarantees | |
are so weak that inter-thread communication would be practically | |
impossible. | |
To make things simpler, POSIX says that you can assume that certain | |
functions "synchronize memory". For example, suppose a thread | |
acquires a mutex, makes some changes to a data structure, and then | |
releases the mutex: | |
- The compiler is *not* free to move loads or stores across the | |
mutex operations, even though there's no apparent data dependency | |
between the mutex operation and whatever tree / hash table / | |
cheese danish it protects. | |
- Processors are *not* free to move loads back across the mutex | |
locking, nor may they move stores forward across the mutex | |
release. [I can't figure out whether those restrictions are | |
strong enough. Do mutex operations need to restrict other sorts | |
of reordering, too?] | |
This means that, if you always protect data shared between | |
different threads by using mutexes in the traditional way, then you | |
can program as if all loads and stores will always occur in the | |
order you write them. By writing in the mutex operations, you've | |
restricted the actual ordering enough to get the job done. | |
The problem is that POSIX defines only a rather small set of | |
functions to be memory-synchronizing. This isn't normally a | |
problem --- just use one of the other functions that does | |
synchronize memory. But Minor needs to pause threads with signals | |
and then synchronize between the paused threads and the collector, | |
and there are no reasonable functions that both synchronize memory | |
and are safe to use in signals handlers. The acts of entering and | |
leaving a signal handler are not memory-synchronizing. | |
So we define here two operations, to be implemented as appropriate | |
for each platform that Minor is ported to, that have the memory- | |
synchronizing effects of mutex locking and unlocking, but don't do | |
any of the blocking or inter-thread communication. That is, if you | |
can guarantee by other means that one will be executed before the | |
other (say, you do one before sending a signal, and you do the | |
other in a signal handler), then these will make sure the loads and | |
stores to shared memory act that way, too. | |
And these functions are async-safe; they may be used within signal | |
handlers. | |
mn__thread_memory_release () tells the compiler and processor not | |
to move loads or stores forward across the call. By the time a | |
thread returns from this function, all stores written before it are | |
visible in main memory, and all loads written before it have | |
completed. | |
mn__thread_memory_acquire () tells the compiler and processor not | |
to move loads or stores backward across the call. All loads | |
written after a call to this function will see the effects of | |
stores that occurred before it. All stores written after a call to | |
this function will be performed after it returns. | |
The general idea is that, if thread A calls mn__thread_memory_release | |
before thread B calls mn__thread_memory_acquire, then that has the | |
same effect on memory synchronization as A releasing a mutex and | |
then B acquiring that mutex. */ | |
void mn__thread_memory_release (void); | |
void mn__thread_memory_acquire (void); | |
/* Ensure that the calling thread will execute the machine code | |
currently present in memory, even if it has been modified or newly | |
generated. | |
If a program writes new executable code to memory, or modifies | |
existing code, many architectures don't promise to run such code | |
correctly unless the processor first executes some sort of special | |
instruction. | |
On many processors, this is simply an instruction cache flush: | |
their icaches aren't automatically invalidated by writes to the | |
cached addresses. Later IA-32 processors actually internally | |
recompile IA-32 machine code on the fly to an internal instruction | |
set, and cache the translations in a "trace cache", which needs to | |
be flushed if the IA-32 "source" code changes. | |
On a multi-processor system, each processor that may execute the | |
modified or freshly-generated code must call this function. | |
This really isn't a threading primitive at all: it's not required | |
to do any inter-processor memory synchronization. All it's | |
required to do is make the processor properly execute the | |
instructions from memory that a read would see. */ | |
void mn__thread_machine_code_changed (void); | |
#endif /* MN__GC_BARRIERS_H */ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment