Last active
September 30, 2022 00:19
-
-
Save DmitrySoshnikov/4646658 to your computer and use it in GitHub Desktop.
Reference counting GC
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Automatic Reference Counting (ARC) Garbage Collection technique. | |
* | |
* This diff describes Reference counting GC technique. | |
* See also previous lesson on Mark and Sweep GC: | |
* https://gist.github.com/4391763 | |
* | |
* by Dmitry Soshnikov <[email protected]> | |
* MIT Style License | |
*/ | |
// ****************************************************************** | |
// I. Preamble: heap and references | |
// ****************************************************************** | |
// ------------------------------------------------------------------ | |
// 1. Heap | |
// ------------------------------------------------------------------ | |
// For simplicity, we represent the "heap" (our memory where all objects | |
// are stored) as a JS array. Address at which an object is stored is | |
// the array index. | |
var heap = []; | |
// ------------------------------------------------------------------ | |
// 2. References | |
// ------------------------------------------------------------------ | |
// To *refer* objects on the heap, we need special values which | |
// store *addresses* of the objects. These are *pointers*, or in our | |
// cases -- *references*. | |
// | |
// We introduce small helper class, the "Reference" class. It just | |
// encapsulates the address and provides couple of convenient methods | |
// to manipulate the referenced contents. | |
// | |
function Reference(object) { | |
// Initially it's a null reference, | |
// doesn't point to anything. | |
this._address = null; | |
// If object shape is passed, allocate it | |
// on the heap and assign to this reference. | |
if (object) { | |
// These two main functions, "assign" and "allocate" | |
// are described below. | |
assign(this, allocate(object)); | |
} | |
} | |
Reference.prototype = { | |
// gets the reference value (the address) | |
getAddress: function () { | |
return this._address; | |
}, | |
// Gets the actual contents stored by this address. | |
// This operation is known as *dereferencing* of | |
// a reference (of a pointer). | |
getContents: function () { | |
return heap[this.getAddress()]; | |
}, | |
// Rebinds the reference to another address. | |
setAddress: function (address) { | |
this._address = address; | |
} | |
}; | |
// ****************************************************************** | |
// II. Reference counting GC | |
// ****************************************************************** | |
// Automatic reference counting (ARC) is a garbage collection technique | |
// with a very simple algorithm: every object stores a special value (reference | |
// count value) which is increased every time a new reference is assigned the | |
// address of this object, and decreased each time the address is removed from | |
// some reference to this object. The whole "magic" happens at assignment to | |
// a reference, rebinding the reference to another address and removing | |
// objects from the heap. | |
// (Note, JS's GC is not specified. However, in practice usually can be found | |
// Mark and Sweep GC (since ARC has some disadvantages described bellow). | |
// ARC is used in some languages though, like Objective-C). | |
// ------------------------------------------------------------------ | |
// 3. Objects allocation | |
// ------------------------------------------------------------------ | |
// To manage this reference counting special value, we introduce a handy | |
// "allocate" function which initializes reference counting flag to zero, | |
// and also provides actual allocation of this object on the heap (at first | |
// available free address). As a result, the function returns the address | |
// at which object was allocated. | |
function allocate(obj) { | |
var allocAddress = heap.length; | |
obj.__rc__ = 0; | |
heap.push(obj); | |
return allocAddress; | |
} | |
// ------------------------------------------------------------------ | |
// 4. Assignment to references, ARC algorithm | |
// ------------------------------------------------------------------ | |
// That's said, the main algorithm actually happens at assignment operation. | |
// | |
// 1. We get the *previous (current) object* to which the assigning reference | |
// points to and *decrease* its reference counting. | |
// | |
// 2. Then we get the object stored at *new address* and *increase* | |
// its reference count (but, only if the new address is not null: | |
// in this case we just reset this reference at first step). | |
// | |
// 3. Now, if reference count flag of the *previously referred object* | |
// is zero, we immediately remove the object from the heap. | |
// | |
// 4. Finally, we do actual assignment to the reference, i.e. set it | |
// to the new passed address. | |
function assign(ref, address) { | |
// Previously (currently) stored object | |
// at the address of this reference. | |
var objectOfRef = ref.getContents(); | |
// Object at new address. | |
var objectAtAddress = heap[address]; | |
// Decrement the reference count of the previous object | |
// (if its reference was not null-reference of course) | |
objectOfRef && objectOfRef.__rc__--; | |
// And if the new address is not null, increase the | |
// reference count of the new object. | |
if (address != null) { | |
objectAtAddress.__rc__++; | |
} | |
// Check and remove old object from the heap, if there no reference to | |
// it anymore (i.e. its reference count value is zero). | |
recycleObjectIfNoRef(ref); | |
// Finally, assign the reference to the new address. | |
ref.setAddress(address); | |
// And return the reference back. | |
return ref; | |
} | |
// ------------------------------------------------------------------ | |
// 5. Recycling non-referenced objects, ARC algorithm | |
// ------------------------------------------------------------------ | |
// As we said, if reference counting value becomes zero, | |
// the object may be removed from the heap (since no one refers | |
// it anymore). However, at destructuring the object, we should | |
// deallocate all its properties too. And if some of its properties | |
// are reference to objects too, we should decrease reference counting | |
// value of these sub-objects as well, and potentially check them | |
// for recycling. | |
function recycleObjectIfNoRef(ref) { | |
// Object stored at address. | |
var objectOfRef = ref.getContents(); | |
// If it's not null and no one refers it anymore, it's time | |
// to remove it from the heap. | |
if (objectOfRef && objectOfRef.__rc__ == 0) { | |
// But, before recycling the object itself, first handle all its reference | |
// sub-properties, decreasing reference count of objects they refer. | |
for (var k in objectOfRef) if (objectOfRef[k] instanceof Reference) { | |
// Just reset this sub-reference, it will decrease its __rc__ | |
// and recursively call this recycling method iside the "assing" method. | |
var subRef = objectOfRef[k]; | |
assign(subRef, null); | |
} | |
// Now we can remove our object itself from the heap. | |
heap.splice(ref.getAddress(), 1); | |
} | |
} | |
// ------------------------------------------------------------------ | |
// 6. Testing | |
// ------------------------------------------------------------------ | |
// -- 6.1 One reference to an object: ------------------------------- | |
// Allocate "a" object and assign it to "a" reference | |
// (this is roughly equivalent to var a = {name: 'a'} in JS). | |
// a ---> {name: 'a'} | |
var a = new Reference({name: 'a'}); | |
// The same with "b" object, which is assigned to "b" reference, | |
// (var b = {name: 'b'} in JS). | |
// b ---> {name: 'b'} | |
var b = new Reference({name: 'b'}); | |
// Check the heap state: currently "a" and "b" objects | |
// are referenced by one reference per each object | |
// (i.e. __rc__ flag of both objects is 1). | |
// [{name: 'a', __rc__: 1}, {name: 'b', __rc__: 1}] | |
console.log(heap); // "a" and "b" objects | |
// -- 6.2 Two references points to the same object: ----------------- | |
// Now set the "b" reference to the *same address* as "a" reference has. | |
assign(b, a.getAddress()); | |
// This causes decreasing reference count of "b" object on the heap and | |
// increasing reference count of the "a" object. Now "b" object doesn't | |
// have any reference to it, which makes it automatically garbage collected: | |
// a ---> {name: 'a', __rc__: 2} <--- b | |
// {name: 'b', __rc__: 0} | |
// [{name: 'a', __rc__: 2}] | |
console.log(heap); // only "a" object is left and "b" object is removed. | |
// -- 6.3 Reset references (set to null): --------------------------- | |
// As we implemented in "assign", if the assigning address is null, | |
// it means the __rc__ of the object the reference points to is | |
// just decreased: | |
assign(a, null); // reset "a" reference | |
console.log(heap); // [{name: 'a', __rc__: 1}] | |
// Still there is one reference to "a" object via "b" reference, | |
// reset it as well. It automatically removes "a" object | |
// from the heap, since __rc__ becomes 0. | |
assign(b, null); | |
console.log(heap); // [], empty. | |
// -- 6.4 Issue: reference cycles ----------------------------------- | |
// There is one problem with ARC which cannot be handled in obvious way | |
// (although, there are some workarounds) -- this is *reference cycles*. | |
// | |
// Consider a use case (in JS-code example): | |
// | |
// var x = {name: 'x'}; | |
// x.next = {name: 'x:next'}; | |
// x.next.backToX = x; | |
// | |
// In our implementation, it's represented as follows: | |
// Allocate "x" object, x ---> {name: 'x'} | |
var x = new Reference({name: 'x'}); | |
// Allocate "next" object which is referenced as x.next: | |
// x ---> {name: 'x', next: ---> {name: 'x:next'}} | |
var xNext = x.getContents().next = new Reference({name: 'x:next'}); | |
// And now property "backToX" of the "next" object refers back to the | |
// "x" object. Set "x.next.backToX" to the same address as "x" reference has. | |
var backToX = xNext.getContents().backToX = new Reference; | |
assign(backToX, x.getAddress()); | |
// Check the heap state, it should be 2 objects: "x" is referred twice (from the | |
// global "x" and from the "backToX", and "x:next" is referred once from "x.next"). | |
// Notice how "next" and "backToX" properties stores addresses (heaps array indices). | |
// | |
// [{name: 'x', next: 1, __rc__: 2}, {name: 'x:next', backToX: 0, __rc__: 1}] | |
console.log(heap); // "x" and "x:next" | |
// Now the actual problem happens if we reset global "x" reference setting it to null: | |
assign(x, null); | |
// This decreased __rc__ of the "x" object by one and from the user code we cannot reach | |
// this object anymore. But it *cannot be garbage collected* by ARC, since its __rc__ is | |
// still not zero! -- there is still one reference to it from "x.next.backToX" property! | |
// [{name: 'x', next: 1, __rc__: 1}, {name: 'x:next', backToX: 0, __rc__: 1}] | |
console.log(heap); | |
// To avoid this situation, *before setting "x" to null* we should *manually* reset | |
// "x.next.backToX" reference. I.e. set it manually to null too. | |
// (Restore "x" from the "backToX") | |
assign(x, backToX.getAddress()); | |
// Check that "x" object is again referenced by two pointers: | |
// [{name: 'x', next: 1, __rc__: 2}, {name: 'x:next', backToX: 0, __rc__: 1}] | |
console.log(heap); // "x" and "x:next" | |
// Now do it right, and reset first "backToX": | |
assign(backToX, null); | |
// Check the heap, see "x" is correctly referenced by one reference: | |
// [{name: 'x', next: 1, __rc__: 1}, {name: 'x:next', backToX: 0, __rc__: 1}] | |
console.log(heap); | |
// And now we can safely reset the global "x" pointer, which automatically | |
// will garbage collect both "x" object and "x.next" as one of its sub-objects: | |
assign(x, null); | |
console.log(heap); // [], empty. | |
// ------------------------------------------------------------------ | |
// 7. Conclusion | |
// ------------------------------------------------------------------ | |
// ARC GC technique has its own advantages and disadvantages as we saw: | |
// | |
// Advantages: | |
// | |
// 1. Simple at implementation | |
// 2. No pauses for GC cycles as e.g. in Mark and Sweep GC. | |
// | |
// Disadvantages: | |
// | |
// 1. Reference cycles which cannot be handled in obvious way. | |
// In the example above we used manual resetting sub-objects | |
// to null. Alternatively, a runtime may use both ARC and | |
// Mark and Sweep GC (first removes objects immediately, the second | |
// cleans up objects which ARC couldn't handle). | |
// | |
// 2. Performance issues. The assignment operation besides actual assignment | |
// should additionally and every time handle __rc__ value of two objects. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey @DmitrySoshnikov! I came here via google, and this is a really cool tutorial! Thanks for making it so clear to understand.
I may have found a bug, if I'm understanding it correctly, and wanted to verify. Wouldn't the use of splice (https://gist.github.com/DmitrySoshnikov/4646658#file-reference-counting-gc-js-L174) cause other addresses that followed to become invalid since it would change the length of the array? I believe you may want to something like
heap.splice(ref.getAddress(), 1, null);
, which would preserve subsequent addresses but also require some sort of "memory compaction" later on.