I'm going to make the following simplifying assumptions:
-
There aren't many custom_alloc_custom_mem blocks (caml_dependent_size ~= 0)
-
There aren't many caml_alloc_custom blocks with nonzero mem/max (caml_extra_heap_resources ~= 0)
-
There are no manually triggered GC slices (howmuch = -1, caml_major_work_credit = 0)
-
The smoothing of slice amounts has no effect (filt_p ~= p)
-
There are few or no weak pointers and ephemerons (Phase_clean is negligible)
-
The number of roots is much smaller than the size of the heap (stat_heap_wsz + caml_incremental_roots_count ~= stat_heap_wsz)
Under these assumptions, the calculations in
caml_major_collection_slice simplify to the following. (For
readability, I've removed caml_
prefixes, casts, comments, logging,
etc.)
p = allocated_words * 3 * (100 + percent_free)
/ stat_heap_wsz / percent_free / 2;
if (gc_phase == Phase_mark) {
computed_work = p * stat_heap_wsz * 250 / (100 + percent_free);
}else{
computed_work = p * stat_heap_wsz * 5 / 3;
}
Expanding p
, we get:
if (gc_phase == Phase_mark) {
computed_work = allocated_words * 3 * 250 / (2 * percent_free);
}else{
computed_work = allocated_words * 5 * (100 + percent_free) / (2 * percent_free);
}
Note that the uses of stat_heap_wsz in p and in computed_work cancel out, so the amount of work done does not depend on the heap size.
The computed_work
is equal to allocated_words
times a
constant. Let's name these constants m
and s
. They represent the
amount of work done per word of allocation during mark and sweep
slices respectively, and their values are:
m = 3 * 250 / (2 * percent_free)
s = 5 * (100 + percent_free) / (2 * percent_free)
The default percent_free = 80 gives m = 4.6875, s = 5.625.
This is open-loop control: the parameters m and s are determined only from configuration (percent_free), and there is no feedback mechanism. The open-loop nature of this controller makes it much simpler to analyse.
The interesting question is about the relationship between m, s, percent_free and the actual heap size.
As soon as a major GC cycle starts, the allocated data on the heap is snapshotted, dividing it into live data (reachable from the roots at the moment the cycle starts) and garbage (unreachable).
Write L for the size of the live data. It is convenient to work in units of L, so let's measure the garbage as a proportion g of L, so that the allocated data at cycle start has size:
L + Lg
The value of L at any given moment is a property of the program, unaffected by GC decisions. However, the value of g at the start of the cycle is a function of GC decisions in previous cycles.
The first phase is marking, as the collector determines what's live and what's garbage. The total amount of marking work to do is L (only live data need be marked), and since we do m marking work per word of allocation, we will allocate L/m words during the marking phase.
At the end of marking, the amount of allocated data is maximal: we have been allocating during marking, but we have not yet begun to collect any garbage. The amount of allocated data here is:
L + Lg + L/m
Next, we begin sweeping. We must sweep the entire heap, and we do s sweeping per allocated word. By the end of sweeping, we have collected exacty Lg garbage, as none of the data allocated during marking or sweeping can be collected this cycle. This leaves the total amount of allocated data as the sum of live at the start, allocated during marking, and allocated during sweeping:
L + L/m + (L + Lg + L/m)/s
Finally, the next cycle begins, splitting the above into L' + L' g'.
An important property is that if the amount of live data is constant, then the heap size should also be constant. So, taking L to be constant (L = L'), let's see what happens to g.
L + L g' = L + L/m + (L + Lg + L/m)/s
g' = 1/m + (1 + g + 1/m)/s
g' = 1/m + 1/s + g/s + 1/ms
This has the form of a linear recurrence g -> αg + β, where
α = 1/s
β = 1/m + 1/s + 1/ms
This is stable as long as α < 1, and tends towards its unique fixpoint of β/(1-α), or:
(1/m + 1/s + 1/ms) / (1 - 1/s)
The stability condition is that s > 1: we must sweep faster than we're allocating, or sweeping won't terminate. There is no such condition on marking, as the amount of mark work is fixed at the start of the cycle.
The "overhead" is the ratio of non-live heap to live heap. The heap is at its largest just at the end of marking, at which time the overhead is g + 1/m. In the steady state, that gives an overhead of:
(1/m + 1/s + 1/ms) / (1 - 1/s) + 1 / m
=
(2/m + 1/s) / (1 - 1/s)
With the current default values of m and s, we end up with an overhead of 0.735. We can choose a specific value of o by solving for m and s.
There is one other parameter to pick: the ratio λ between mark and sweep work. We choose m = λs, and try to pick λ so that the amount of overhead per allocated word is approximately equal between marking and sweeping. That is, we try to ensure that doing λ words of sweeping has about the same cost as doing 1 word of marking.
I think we could experimentally determine a reasonable hardcoded value for λ, although it will depend somewhat on the program behaviour (poor locality makes marking slower without affecting sweeping much, while a larger average object size speeds up sweeping without affecting marking much).
Then, we can take:
s = 1 + (1 + 2/λ) / o
m = λs
We'd like to minimise the maximum pause time. For a given amount of work, that happens when all pauses are equal. It's easy enough to make mark pauses approximately equal (do the same amount of marking work per slice), and likewise for sweep pauses, but it takes a careful choice of λ to make the mark pauses the same length as the sweep pauses.