Skip to content

Instantly share code, notes, and snippets.

@jki127
Last active December 18, 2019 23:06
Show Gist options
  • Save jki127/c607d7570e8b56abd59526de3522eced to your computer and use it in GitHub Desktop.
Save jki127/c607d7570e8b56abd59526de3522eced to your computer and use it in GitHub Desktop.

Garbage Collection - Programming Languages - Oct 31st, 2019

Problems with Manual Memory Management

Dangling Pointer

Without Dynamic Allocation (Without Heap)

int &foo(){
	int noodle = 3;
	return noodle;
}

int main() {
	int &mynum = foo(); // dangling pointer
	cout << mynum << endl;
	return 0;
}

With Heap

int main(){
	int *noodle = new int(3);
	delete noodle;
	cout << *noodle << endl; // dangling pointer
}

Heap Leak

Double Free


Can we have the same issues with Python?

class MyObj:
	pass

def main():
	foo = MyObj()
	foo = None
	print(foo)

Objects:

  • MyObj
  • None

Timeline of foo and what it points to:

  1. foo -> MyObj
  2. foo -> None

What is the lifetime of foo in the following code?

class MyObj:
	pass

def main():
	foo = MyObj()
	return foo

def main():
	print(splat())
Answer: *at least* until `print` is finished

What is the lifetime of foo in the following code?

class MyObj:
	pass

def main():
	foo = MyObj()
	return foo

def main():
	global n 
	w = splat()
	n = w
	print(w)
Answer: *at least* until `main` is finished

Main point: We can't a put a upper-bound on the lifetime of these objects, but we can put a reasonable lower-bound.

class MyObj:
	pass

def splat():
	foo = MyObj()
	foo.x = MyObj()
	green = foo
	foo = None
	print(green)
	green = None
class MyObj:
	pass

def splat():
	foo1 = MyObj()
	foo2 = MyObj()
	foo1.x = foo2
	foo2.x = foo1
foo1 -> MyObj1
foo2 -> MyObj2
MyObj1.x -> MyObj2
MyObj2.x -> MyObj1

So now we have this:
MyObj1 ⟺ MyObj2

We can still delete these objects because there are no variables pointing to them.

Mark & Sweep

Mark & Sweep belongs to a category of algorithms made for Garbarge Collection which are meant to deallocate inaccessible objects.

Phase 1: Mark

We're going to say that every object has a bit which is initially 0 associated with them. Think of this as a flag. What we can do is do a depth-first search from all the "roots" (all names defined in the program), for every object that we can find mark it as visited.

Marking means flipping the bit from 0 to 1.

How do we handle cycles?

We do not need to recurse on nodes that are already marked with 1

Phase 2: Sweep

Find all the objects still have their bit set to 0 and deallocate them.

When does Mark & Sweep run?

It runs in a background thread

  • It feels it is getting low on Heap space
  • It runs in an interval (e.g. every 5 seconds)

Problems with Garbage Collection

What gets deallocated in the following program?

def myprogram():
	a = Object()
	b = Object()
	b.c = Object()
	# let's do mark phase here:
	# mark a as reachable
	a.c = Object()
	b.c = None
	# mark continues on b
	# sweep runs

This program will deallocate a because it seems to be no longer in-use when the objects were marked. This is incorrect because afterward we still use a.

This is why Python has to Stop the world (stop the program) and run the entire Mark & Sweep program before re-running the program.

class X {
	~X () {cout << "X::~x()" << endl;}
};

int foo () {
	X x;
	cout << "hello" << endl;
}

int foo2() {
	X *x = new X;
	delete x;
}

// C++ but with gc
int foo3() {
	new X;
	dostuff();
	dowhatever();
	morestuff();
	return 0;
}

C++ has destructors.
Java has destructors, but they're usually called Finalizers.
Python has destructors, but most people don't use them because we don't know when they're going to be run because we don't know when Garbage Collection will happen!

class Foo {
public:
	int x;
}

int main() {
	Foo *myfoo = new Foo;
	myfoo->x = 42;

	int *q = &myfoo->x;

	return 0;
}

The problem with the above code is that when myfoo is deallocated then the thing that q points to is also going to be deallocated.

The problem is a little more obvious in the following code:

class Foo {
public:
	int x;
}

int main() {
	int *w;
	{
		Foo *myfoo = new Foo;
		myfoo->x = 42;

		int *q = &myfoo->x;
		w = q;
		// roots: w, myfoo, q	
	}
	// roots: w
	// w has now become a dangling pointer
	cout << *w << endl;
	return 0;
}

Another way this can happen:

int *v;
int *q;
q = new int[35];
v = q+5;
// what happens to v if q is deallocated?
cout << *v << endl;
class X {};

X *x = new X;
int foo;

foo = (int) x;
x = null;
// garbage collection runs here
// (X*) converts X back into a pointer
cout << (X*)foo;

#proglang-f19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment