Last active
December 11, 2015 17:09
-
-
Save DmitrySoshnikov/4632736 to your computer and use it in GitHub Desktop.
Automatic Reference Counting (ARC) Garbage Collector
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Automatic Reference Counting (ARC) Garbage Collection technique. | |
* | |
* This diff describes Reference counting GC technique. | |
* See also previous lesson on Mark and Sweep GC: | |
* https://gist.github.com/4391763 | |
* | |
* Covered topics: | |
* | |
* 1. Reference value in ECMAScript (JS specific) | |
* 2. ARC algorimths from generic computer science. | |
* | |
* *** NOTE: here is simplified version of this diff, which | |
* doesn't use ECMAScript-specific Reference type: | |
* https://gist.github.com/4646658 | |
* | |
* by Dmitry Soshnikov <[email protected]> | |
* MIT Style License | |
*/ | |
// ****************************************************************** | |
// I. Preamble: heap and references | |
// ****************************************************************** | |
// ------------------------------------------------------------------ | |
// 1. Heap | |
// ------------------------------------------------------------------ | |
// For simplicity, we represent the "heap" (our memory where all objects | |
// are stored) as a JS array. Address at which an object is stored is | |
// the array index. | |
var heap = []; | |
// ------------------------------------------------------------------ | |
// 2. References | |
// ------------------------------------------------------------------ | |
// To *refer* objects on the heap, we need special values which | |
// store *addresses* of the objects. These are *pointers*, or in our | |
// cases -- *references*. | |
// | |
// For example, if "x" reference stores value 1, it means that at index | |
// 1 of the heap array there is an object that "x" points to. | |
// | |
// For global variables we could store addresses directly in the variables | |
// (like var x = 1), however, some properties of objects may be the references | |
// themselves and refer another objects. | |
// | |
// To handle both cases in the same way, we introduce small | |
// helper class, the "Reference" class. | |
// | |
// In JS world, we achieve it through two components of a reference object: | |
// the "base" object and the "offset" in borders of this object. | |
// (The "offset" is a property name of this object in JS) | |
// | |
// Examples: | |
// | |
// Global variable (reference): the "base" is the global object, | |
// the "offset" (the property name) is "a": | |
// | |
// var a = {name: 'a'}; // Reference(global, "a") | |
// | |
// A reference within another object, the "base" is "a" object, | |
// the offset (property name) is "x": | |
// | |
// a.x = {name: 'x'}; // Reference(a, "x"); | |
// | |
// The value stored at these coordinates (base:offset) is exactly | |
// the *address* on the heap (the index of the heap array). Having | |
// the address, we then can get the value stored at it. | |
function Reference(base, offset) { | |
this._base = base; | |
this._offset = offset; | |
} | |
Reference.prototype = { | |
// gets the reference value (the address) | |
getAddress: function () { | |
return this._base[this._offset]; | |
}, | |
// Gets the actual contents stored by this address. | |
// This operation is known as *dereferencing* of | |
// a reference (of a pointer). | |
getContents: function () { | |
return heap[this.getAddress()]; | |
}, | |
// Rebinds the reference to another address. | |
setAddress: function (address) { | |
this._base[this._offset] = address; | |
} | |
}; | |
// (BTW, exactly this implementation of references is used and specified | |
// in ECMAScript specification). | |
// ****************************************************************** | |
// II. Reference counting GC | |
// ****************************************************************** | |
// Automatic reference counting (ARC) is a garbage collection technique | |
// with a very simple algorithm: every object stores a special value (reference | |
// count value) which is increased every time a new reference is assigned the | |
// address of this object, and decreased each time the address is removed from | |
// some reference to this object. The whole "magic" happens at assignment to | |
// a reference, rebinding the reference to another address and removing | |
// objects from the heap. | |
// (Note, JS's GC is not specified. However, in practice usually can be found | |
// Mark and Sweep GC (since ARC has some disadvantages described bellow). | |
// ARC is used in some languages though, like Objective-C). | |
// ------------------------------------------------------------------ | |
// 3. Objects allocation | |
// ------------------------------------------------------------------ | |
// To manage this reference counting special value, we introduce a handy | |
// "allocate" function which initializes reference counting flag to zero, | |
// and also provides actual allocation of this object on the heap (at first | |
// available free address). As a result, the function returns the address | |
// at which object was allocated. | |
function allocate(obj) { | |
var allocAddress = heap.length; | |
obj.__rc__ = 0; | |
heap.push(obj); | |
return allocAddress; | |
} | |
// ------------------------------------------------------------------ | |
// 4. Assignment to references, ARC algorithm | |
// ------------------------------------------------------------------ | |
// That's said, the main algorithm actually happens at assignment operation. | |
// | |
// 1. We get the *previous (current) object* to which the assigning reference | |
// points to and *decrease* its reference counting. | |
// | |
// 2. Then we get the object stored at *new address* and *increase* | |
// its reference count (but, only if the new address is not null: | |
// in this case we just reset this reference at first step). | |
// | |
// 3. Now, if reference count flag of the *previously referred object* | |
// is zero, we immediately remove the object from the heap. | |
// | |
// 4. Finally, we do actual assignment to the reference, i.e. set it | |
// to the new passed address. | |
function assign(ref, address) { | |
// Previously (currently) stored object | |
// at the address of this reference. | |
var valueOfRef = ref.getContents(); | |
// Object at new address. | |
var valueAtAddress = heap[address]; | |
// Decrement the reference count of the previous object | |
// (if its reference was not null-reference of course) | |
valueOfRef && valueOfRef.__rc__--; | |
// And if the new address is not null, increase the | |
// reference count of the new object. | |
if (address != null) { | |
valueAtAddress.__rc__++; | |
} | |
// Check and remove old object from the heap, if there no reference to | |
// it anymore (i.e. its reference count value is zero). | |
recycleObjectAtAddressIfNoRef(ref.getAddress()); | |
// Finally, assign the reference to the new address. | |
ref.setAddress(address); | |
// And return the reference back. | |
return ref; | |
} | |
// ------------------------------------------------------------------ | |
// 5. Recycling non-referenced objects, ARC algorithm | |
// ------------------------------------------------------------------ | |
// As we said, if reference counting value becomes zero, | |
// the object may be removed from the heap (since no one refers | |
// it anymore). However, at destructuring the object, we should | |
// deallocate all its properties too. And if some of its properties | |
// are reference to objects too, we should decrease reference counting | |
// value of these sub-objects as well, and potentially check them | |
// for recycling. | |
function recycleObjectAtAddressIfNoRef(address) { | |
// Object stored at address. | |
var valueAtAddress = heap[address]; | |
// If it's not null and no one refers it anymore, it's time | |
// to remove it from the heap. | |
if (valueAtAddress && valueAtAddress.__rc__ == 0) { | |
// But, before recycling the object itself, first all its reference | |
// sub-properties, decreasing reference count of object they refer. | |
for (var k in valueAtAddress) if (k != '__rc__' && isAddress(valueAtAddress[k])) { | |
// The address on the heap of this sub-object. | |
var subObjectAddress = valueAtAddress[k]; | |
// Actual object stored by this address. | |
var subObject = heap[subObjectAddress]; | |
// Decrease its reference count. | |
subObject.__rc__--; | |
// And also check for possible releasing from the heap. | |
recycleObjectAtAddressIfNoRef(subObjectAddress); | |
} | |
// Now we can remove our object itself from the heap. | |
heap.splice(address, 1); | |
} | |
} | |
// For simplicity, use this helper funciton checking whether | |
// a property of some object is also the reference to another | |
// object, i.e. stores *address* (just the numed, the heap's array | |
// indecies). | |
function isAddress(value) { | |
return typeof value == 'number'; | |
} | |
// ------------------------------------------------------------------ | |
// 6. Testing | |
// ------------------------------------------------------------------ | |
// global variables (references) have the global | |
// object as their "base" part. | |
var global = this; | |
// -- 6.1 One reference to an object: ------------------------------- | |
// Allocate "a" object and assign it to "aRef" reference | |
// (this is rougly equivalent to var aRef = {name: 'a'} in JS). | |
// aRef ---> {name: 'a'} | |
var aRef = assign(new Reference(global, 'a'), allocate({name: 'a'})); | |
// The same with "b" object, which is assigned to "bRef" reference, | |
// (var bRef = {name: 'b'} in JS). | |
// bRef ---> {name: 'b'} | |
var bRef = assign(new Reference(global, 'b'), allocate({name: 'b'})); | |
// Check the heap state: currently "a" and "b" objects | |
// are referenced by one reference per each object | |
// (i.e. __rc__ flag of both objects is 1). | |
// [{name: 'a', __rc__: 1}, {name: 'b', __rc__: 1}] | |
console.log(heap); // "a" and "b" objects | |
// -- 6.2 Two references points to the same object: ----------------- | |
// Now set the "bRef" reference to the *same address* as "aRef" has. | |
assign(bRef, aRef.getAddress()); | |
// This causes decreasing reference count of "b" object on the heap and | |
// increasing reference count of the "a" object. Now "b" object doesn't | |
// have any reference to it, which makes it automatically garbage collected: | |
// aRef ---> {name: 'a', __rc__: 2} <--- bRef | |
// {name: 'b', __rc__: 0} | |
// [{name: 'a', __rc__: 2}] | |
console.log(heap); // only "a" object is left and "b" object is removed. | |
// -- 6.3 Reset references (set to null): --------------------------- | |
// As we implemented in "assign", if the assigning address is null, | |
// it means the __rc__ of the object the reference points to is | |
// just decreased: | |
assign(aRef, null); // reset "aRef" reference | |
console.log(heap); // [{name: 'a', __rc__: 1}] | |
// Still there is one reference to "a" object via "bRef", | |
// reset it as well. It automatically removes "a" object | |
// from the heap, since __rc__ becomes 0. | |
assign(bRef, null); | |
console.log(heap); // [], empty. | |
// -- 6.4 Issue: reference cycles ----------------------------------- | |
// There is one problem with ARC which cannot be handled in obvious way | |
// (although, there are some workarounds) -- this is *reference cycles*. | |
// | |
// Consider a use case (in JS-code example): | |
// | |
// var xRef = {name: 'x'}; | |
// x.next = {name: 'x:next'}; | |
// x.next.backToX = x; | |
// | |
// In our implementation, it's represented as follows: | |
// Allocate "x" object, x ---> {name: 'x'} | |
var xRef = assign(new Reference(global, 'x'), allocate({name: 'x'})); | |
// Allocate "next" object which is referenced as x.next: | |
// x ---> {name: 'x', next: ---> {name: 'x:next'}} | |
var xNextRef = assign(new Reference(xRef.getContents(), 'next'), allocate({name: 'x:next'})); | |
// And now property "backToX" of the "next" object refers back to the | |
// "x" object. Set "x.next.backToX" to the same address as "x" reference has. | |
var backToXRef = assign(new Reference(xNextRef.getContents(), 'backToX'), xRef.getAddress()); | |
// Check the heap state, it should be 2 objects: "x" is referred twice (from the | |
// global "xRef" and from the "backToX", and "x:next" is referred once from "x.next"). | |
// Notice how "next" and "backToX" properties stores addresses (heaps array indices). | |
// | |
// [{name: 'x', next: 1, __rc__: 2}, {name: 'x:next', backToX: 0, __rc__: 1}] | |
console.log(heap); // "x" and "x:next" | |
// Now the actual problem happens if we reset global "xRef" reference setting it to null: | |
assign(xRef, null); | |
// This decreased __rc__ of the "x" object by one and from the user code we cannot reach | |
// this object anymore. But it *cannot be garbage collected* by ARC, since its __rc__ is | |
// still not zero! -- there is still one reference to it from "x.next.backToX" property! | |
// [{name: 'x', next: 1, __rc__: 1}, {name: 'x:next', backToX: 0, __rc__: 1}] | |
console.log(heap); | |
// To avoid this situation, *before setting xRef to null* we should set *manually* reset | |
// "x.next.backToX" reference. I.e. set it manually to null too. | |
// (Restore "xRef" for our experiment, from the "backToRef" now) | |
assign(xRef, backToXRef.getAddress()); | |
// Check that "x" object is again referenced by two pointers: | |
// [{name: 'x', next: 1, __rc__: 2}, {name: 'x:next', backToX: 0, __rc__: 1}] | |
console.log(heap); // "x" and "x:next" | |
// Now do it right, and reset first "backToRef": | |
assign(backToXRef, null); | |
// Check the heap, see "x" is correctly referenced by one reference: | |
// [{name: 'x', next: 1, __rc__: 1}, {name: 'x:next', backToX: 0, __rc__: 1}] | |
console.log(heap); | |
// And now we can safely reset the global "xRef" pointer, which automatically | |
// will garbage collect both "x" object and "x.next" as one of its sub-objects: | |
assign(xRef, null); | |
console.log(heap); // [], empty. | |
// ------------------------------------------------------------------ | |
// 7. Conclusion | |
// ------------------------------------------------------------------ | |
// ARC GC technique has its own advantages and disadvantages as we saw: | |
// | |
// Advantages: | |
// | |
// 1. Simple at implementation | |
// 2. No pauses for GC cycles as e.g. in Mark and Sweep GC. | |
// | |
// Disadvantages: | |
// | |
// 1. Reference cycles which cannot be handled in obvious way. | |
// In the example above we used manual resetting sub-objects | |
// to null. Alternatively, a runtime may use both ARC and | |
// Mark and Sweep GC (first removes objects immediately, the second | |
// cleans up objects which ARC couldn't handle). | |
// | |
// 2. Performance issues. The assignment operation besides actual assignment | |
// should additionally and every time handle __rc__ value of two objects. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment