This document tries to provide a checklist of important concepts in object-oriented programming with a heavy focus on Python. This document is not:
- a complete (or even good) introduction to object oriented programming.
- intended for learning new concepts; my only goal is to help you identify holes in your knowledge.
Everything in bold is a specific, commonly used, technical term. Unless otherwise specified, the technical terms are generic and not Python-specific. I recommend that you try to always keep the distinction between general concepts and their language-specific realization (e.g. how they are in Python) separate.
The internal implementation of OOP in Python has changed a significant amount
between versions 2 and 3, I think the latter is more consistent and easy to
follow. All code examples are in Python 3; the output
of print
calls are shown as comments in the following line.
We need to distinguish between types, values and variables. For
example, when we say x = 'abraham'
this statement does a bunch of things:
- create a value
'abraham'
of typestr
, - create a variable
x
of typestr
and assigns the value'abraham'
to it.
If we then say print('My name is %s.' % x)
. This will:
- evaluate the expression
'My name is %s.' % x
which produces a newstr
value'My name is abraham.'
. This value is as legitimate of a value as the original'abraham'
even though it is not assigned to any variable. - It calls the
print
function with the argument, which must be a value,'My name is abraham.'
.
The definition of a type, roughly speaking, captures two
things: the kinds of data it can hold and the behavior (aka operators)
associated with it. For example, when we say x = 'abraham'
:
- The fact that the sequence of bits (0s and 1s) that store this string in
memory is interpreted as a string (and not say an integer) is part of the
definition of the type
str
- The fact that we can do things like
x.title()
(which evaluates toAbraham
) orx + ' lincoln'
(which evaluates to'abraham lincoln'
) is part of the definition of the typestr
.
At a very high level when a function is called (aka invoked, aka executed):
- caller invokes the function and provides values for its arguments,
- new scope is created,
- arguments are passed from the caller and assigned to variables in the new scope,
- function body is executed and a value is returned to the caller.
Follow the above steps in this example:
def square(x):
return x ** 2
print(square(2))
# 4
print(square(square(2))
# 16
Note that there are a variety of ways in which a function might "do its job":
- returning an output,
- modifying the arguments themselves,
- neither of the two (e.g. storing/sending data somewhere else)
Instances (aka objects) are related to classes in the same way that
values are related to their types. In Python this analogy is literally true; a
class literally defines a new type in the same way that, say, int
is a type:
x = 2
type(x) == int
# True
isinstance(int, type)
# True
class Human:
pass
h = Human()
type(h) == Human
# True
isinstance(Human, type)
# True
The types (i.e. classes) provided by the language itself are called
built-in types; int
, float
, set
, list
, dict
(and a bunch more)
are all built-in types.
When we create an instance of a class we say we are instantiating that
class. In Python, calling __init__
is the last, and most commonly modified,
step of the instantiation process (more on this in level 5).
class Human:
def __init__(self):
print('executing __init__')
h = Human()
# executing __init__
- In most other languages the process of instantiation is handled by a
function called a constructor, Python has something sort of similar to
that which is
__new__
. The subtle difference between__init__
and__new__
(we rarely work directly with the latter) is for level 5. Ignore all of this for now, but know that the word constructor is a very commonly used word and to a good approximation, the Python version of it is__init__
. - Notice that instantiating a class has the same syntax in Python as function
calls. This is a Python-specific feature. Many other languages (e.g. Java,
C++, PHP, JavaScript) have a
new
keyword that is used when instantiating classes (e.g. you would sayh = new Human()
. In Python there is nonew
keyword.
An object (a class instance) has state (i.e. data) and behavior (i.e.
code). The behavior of an object is all its methods (almost all OOP
languages use this term) and its state is its attributes (this is
Python-specific terminology; most other languages call these instance
variables). It is important to know how to work with attributes and
methods of an instance, their scope and how to access them via self
.
Attributes are like variables but they belong to an instance (aka object). You can get them or set them (aka read them or write to them) like any other variable:
class Person:
def __init__(self, name):
self.name = name
p = Person('Mary')
print(p.name)
# Mary
p.name = 'John'
print(p.name)
# John
Methods are like functions but they belong to an instance (more specifically they are bound to that instance). You can call them like any other function:
class Person:
def __init__(self, name):
self.name = name
def hello(self):
print('Hello! My name is %s.' % self.name)
p = Person('Julie')
p.hello()
# Hello! My name is Julie.
p.name = 'Bob'
p.hello()
# Hello! My name is Bob.
-
Notice the magic happening in the signature of methods: the first argument (called
self
by convention) is automatically set by the language to the bound instance; you have no control over it. In a lot of programming languages (e.g. Java, C++, PHP, JavaScript) this magic happens implicitly: in the body of a method you can access the bound instance via athis
keyword. There is no such thing in Python. The fact thatself
is explicit in Python is a reflection of its philosophy of "Explicit is better than implicit." -
Python has the notion of a property which is a method that behaves like an attribute. The whole point of this is convenience. For example:
from datetime import datetime class Person: def __init__(self, yob): self.yob = yob @property def age(self): return datetime.now().year - self.yob p = Person(2000) p.age # 19
Aside: A lot of languages (e.g. Java, C++, PHP) require attributes and
methods to be either private or public, or (in some languages)
protected. None of this exists in Python. However, there are conventions
that kind of achieve the same goal in the end, and that is the use of a leading
underscore (e.g. _some_func
) to signal to other programmers "don't muck with
this". You can ignore this whole business at this level.
Two useful words to describe certain relationships are: is-a ('hello'
is-a
str
) and has-a ('hello'
has-a length). For example, one might say:
- Abraham Lincoln (instance) is-a Human (type).
- Every Human (type) is-a Mammal (type); every Mammal is-a Vertebrate (type); and every Vertebrate is-a Animal (type).
- Abraham Lincoln, by extension, is-a Mammal, Vertebrate, and Animal.
- In contrast, my dog (the individual, instance) is-a Mammal but not is-a Human.
- Any Mammal has-a neocortex, and therefore, both Abraham Lincoln and my dog has-a neocortex.
- Similarly, Lonesome George, is-a vertebrate, but not is-a Mammal, has-a backbone, but not has-a neocortex.
Similar to the above intuitive idea, class hierarchies can be built through
inheritance. A class B
can be a subclass another class A
(aka class
B
extends class A
, aka A
is a superclass, or base class of
B
). This means that:
- Any instance of
B
, aside from is-aB
, also is-aA
(i.e. is an instance of classA
as well). - The relationship between an instance and its attributes and methods is has-a.
- Any instance of
B
inherits the behavior (i.e. methods) defined in classA
.
class Mammal:
def eat(self):
print('eating...')
class Human(Mammal):
def speak(self):
print('It is I!')
h = Human()
h.speak()
# It is I!
h.eat()
# eating...
A subclass can override the behavior in its superclass.
class Mammal:
def eat(self):
print('eating...')
class Human(Mammal):
def eat(self):
print('say grace ...')
print('eating ...')
m = Mammal()
m.eat()
# eating ...
h = Human()
h.eat()
# say grace ...
# eating ...
- The mechanism through which an OOP language achieves this is part of how its
method resolution works. In the above example, when you call
eat
onh
Python needs to resolve which of the two definitions ofeat
to execute, the one defined inMammal
or the one inHuman
. - The ability to override behavior inherited from a superclass allows us to
achieve what is called polymorphism. For example the behavior (i.e method)
eat
in the above example is polymorphic betweenHuman
andMammal
. At this level, the word polymorphism is synonymous with (and fancy speak for) overriding; just know that this term exists.
It is often necessary or useful for an overridden method in a subclass to
delegate parts of its job to superclass's (overridden) method. This is where
super()
comes in. In the above example the eat
method of the Human
class is merely adding an additional step before doing the exact same thing
as its superclass. To keep the code simpler (and more DRY, standing for
don't repeat yourself) we can write:
class Human(Mammal):
def eat(self):
print('say grace ...')
super().eat()
h = Human()
h.eat()
# say grace ...
# eating ...
It is important to distinguish between the identity of an object and its state (its data). Two objects of the same class have the same state if they contain identical data. But they have the same identity only if they are literally stored in the same place in memory. Equal identity implies equal state, but not vice versa.
x = ['hello']
y = ['hello']
x == y # this compares state
# True
x is y # this compares identity
# False
z = x # this defines a new variable (i.e. a name) that points the
# exact same place in memory as x
z is x
# True
z.append('world')
print(x)
# ['hello', 'world']
w = x.copy() # this creates a new identity, a new place in memory
# with identical contents as the original
w == x
# True
w is x
# False
We can access the identity of an object in Python by using the id()
built-in
function which returns the memory address of the object it's given. This is the
only certain way to verify that two variables have values that are identical in
identity and not just state (i.e. modifying one will modify the other one).
Aside: Not all types allow the state of their instances to be modified.
These are called immutable types (and their instances are also called
immutable). The immutable types are all built-in types: int
, bool
, float
,
str
, and tuple
. Other built-in types are mutable: dict
, list
, and set
.
All user-defined classes (types defined in code) are mutable too.
Objects (instances of classes) can be used as any other value, specifically they can be passed as arguments to functions. It is important to understand how the passed object is treated in the new scope of the function:
- The new variable (in the function scope) has the name as defined by the signature of the function and the value as provided by the caller (equal state).
- the new variable also has equal identity as the object provided by the caller.
- This means that if one changes the state of the passed object in the function, this change is state will be seen by the caller. This is sometimes desired and sometimes undesired.
def f(some_list):
some_list.append('world')
return some_list
def g(some_list):
some_list = some_list.copy()
some_list.append('world')
return some_list
x = ['hello']
y = f(x)
print(y)
# ['hello', 'world']
print(x)
# ['hello', 'world']
x = ['hello']
z = g(x)
print(z)
# ['hello', 'world']
print(x)
# ['hello']
Classes themselves can have attributes and methods. These are variables and functions, respectively, that are shared between (aka common to) all instances of that class. In Python these are called class attributes (as opposed to instance attributes or just attributes) and class methods (as opposed to instance methods or just methods).
In a lot of programming languages (e.g. Java, C++, PHP, JavaScript) methods that belong to a class (i.e. shared between all instances) are called static methods. In Python these are called class methods. Unfortunately for a beginner, Python also has static methods which are slightly different (and simpler, and less useful) than class methods.
class HomoSapiens:
speciation_age = 350000 # this is a class attribute
@classmethod
def describe_species(cls): # this is a class method
return '%s, a %d years old species' % (cls.__name__, cls.speciation_age)
def __init__(self, name):
self.name = name
def introduce(self):
print('Hello! I am %s. I am a %s.' % (self.name, self.describe_species()))
print(HomoSapiens.speciation_age)
# 350000
print(HomoSapiens.describe_species())
# HomoSapiens, a 350000 year old species
h = HomoSapiens('John')
print(h.name)
# 'John'
print(h.speciation_age)
# 350000
print(h.describe_species())
# HomoSapiens, a 350000 year old species
print(h.introduce())
# Hello! I am John. I am a HomoSapiens, a 350000 year old species.
Notes:
- Notice the magic happening in the signature of class methods: the first
argument (called
cls
by convention) is automatically set by the language to the class; you have no control over it. This is similar to the way the first argument of instance methods (calledself
by convention) is automatically set by the language to the bound instance. - Notice the fact that Python allows you to access class attributes and
class methods both from the class
HomoSapiens
and from the instanceh
of that class. This is a Python-specific feature (this is part of how Python's name resolution works). In many other languages (e.g. Java, C++, PHP, JavaScript) you can only access class methods through the class itself (i.e.HomoSapiens.speciation_age
and noth.speciation_age
). But be careful! This shothand mechanism only works for reading class attributes, not for writing to them: If we sayh.speciation_age = 12
this would create a new instance attribute for the instanceh
and set it to 12. This will not affect the class attribute value and no other instance ofHuman
buth
will see that new attribute. - Notice how
@
is used to define class methods in a similar way as the way properties (see above) are defined. These are both examples of a feature in Python called decorators (classmethod
andproperty
are both decorators, and@classmethod
and@property
decorate the functions that immediately follows them). Decorators are not particular to OOP and are very useful. You can even define your own decorators! - It is sometimes useful to modify a method in a class from "the outside" (i.e. when we cannot or would prefer not to modify the source code of that class). There is a way to do this which is called monkeypatching (more on this in level 5).
Abstract classes are a mechanism for us to define the interface of a class without specifying its implementation. What makes an abstract class abstract is its abstract methods which define the signature of a method without specifying its implementation. An abstract class cannot be instantiated. Instead one needs to define non-abstract subclasses of the abstract class which provide an implementation for all abstract methods of the abstract superclass. Such a subclass can then be instantiated as usual.
Not all OOP languages provide a mechanism for this (e.g. Ruby does not) and the ones that do (e.g. Python, Java, and PHP all do) provide it in a variety of ways.
In Python abstract classes are defined by extending a special base class from
the built-in abc
module, example:
from abc import ABC, abstractmethod
class AbstractCarnivore(ABC):
@abstractmethod
def hunt(self):
pass
def eat(self):
self.hunt()
print('eating ...')
class Human(AbstractCarnivore):
def hunt(self):
print('hunting ...')
x = AbstractCarnivore()
# TypeError: Can't instantiate abstract class AbstractCarnivore with abstract methods hunt
h = Human()
h.eat()
# hunting ...
# eating ...
Aside: The whole point of abstract classes is ease of extensibility: the author of an abstract class is merely communicating to other programmers the contract that their subclasses must satisfy (the contract being the abstract methods) for it to take advantage of the other (non-abstract, implemented) aspects of the base class.
Multiple inheritance is a mechanism in some programming languages that allows classes to inherit from multiple superclasses (as opposed to a single superclass in single inheritance). Under single inheritance all class hierarchies are trees in the end. With multiple inheritance class hierarchies can become DAGs instead of trees.
General notes:
- Not all OOP languages allow multiple inheritance (e.g. Java does not) and those that do (e.g. Ruby, PHP, JavaScript ES6) provide it in different ways, with different names, and with different limitations.
- Multiple inheritance can easily get really gnarly; a good simple example of
how things can get messy is what is called the diamond problem: Suppose
classes
B
andC
extendA
, and thatD
extends bothB
andC
. IfA
defines a methodf()
that is overridden both byB
andC
but not byD
which version of it should be inherited byD
? - Central to understanding Python's version of multiple inheritance is its
method resolution order (MRO) algorithm which dictates how
super()
gets resolved under multiple inheritance. It is the MRO that is responsible for, say, addressing the diamond problem. - There are two common and useful design patterns in multiple inheritance: mixins and cooperative multiple inheritance. You should probably know about them before trying to write multiple inheritance code in production.
- There is an OOP principle called composition over inheritance which recommends that it's often better to achieve the desired behavior by composing different classes through has-a relationships rather than inheritance (is-a relationships). There is a lot of truth to this; but then again, inheritance (single and multiple) are both extremely useful. Finding the right balance is a matter of problem context, experience, and to some extent, taste.
This example illustrates how to address the diamond problem using cooperative inheritance in Python:
class A:
def __init__(self):
print("A")
super().__init__()
class B(A):
def __init__(self):
print("B")
super().__init__()
class C(A):
def __init__(self):
print("C")
super().__init__()
class D(B, C):
def __init__(self):
print("D")
super().__init__()
D()
# D
# B
# C
# A
Magic (aka dunder) methods are methods with names of the form __X__
and they have special (magic) properties. A lot of them exist in all Python
objects (they are inherited from the object
class, the superclass of all
classes). But there are also a lot of them that could be implemented by a class
to give it special properties.
Magic methods are a very versatile bunch that provide us, the programmer, with a lot of power that is unique to Python. Here is an incomplete list that covers the majority of magic methods, in a very rough and subjective order of usefulness:
__str__
allows an object to control how it behaves when it's cast to a string (e.g. when it's given toprint
).__enter__
and__exit__
allow an object to become a context manager (i.e. you can use it in awith
statement).__setattr__
,__getattr__
and__getattribute__
expose the internal mechanics of attribute resolution and allow you to have more control over how they work in a class.__eq__
and__hash__
allow an object to take control of how it behaves under equality comparisons (i.e. when used in==
) and when its hashed (i.e. passed tohash
, e.g. when it's used as a dictionary key), respectively. These two dunder methods are deeply related and must coordinate their behavior.__iter__
and__next__
allow an object to become iterable (i.e. can be iterated through, e.g. with afor
loop). Related to this is__len__
which is allows an iterable to specify its length (i.e. what happens when it's given tolen()
) and__contains__
which allows an iterably to check membership (i.e. what happens when one uses thein
keyword).__call__
allows an object to become callable (i.e. can be called, just like a function)- numeric operators, e.g.
__add__
,__div__
,__mul__
,__eq__
,__ge__
that expose the internal mechanics of how common numeric operators work. The specific list of examples above correspond to+
,/
,*
,==
,>=
(there are many more of theses). When a class overrides these methods we say that it's overloading operators (e.g. Pandas and numpy make extensive use of operator overloading to provide syntactic convenience). __getitem__
and__setitem__
allow an object to behave like a dictionary.__repr__
allows an object to control how it behaves when it's given torepr
: the goal is to generate a Python expression (i.e. code) that would reproduce that object when executed. A good rule of thumb is that one should havex == eval(repr(x))
.__get__
and__set__
allows you to define descriptors and give you even more control over how attributes work (this is howproperty
is internally implemented).__getstate__
,__setstate__
, and friends allow an object to implement the pickle protocol.__dict__
and__slots__
are magic attributes (not methods) that expose the internal machinery of how attributes are stored in standard objects.__new__
and__del__
are the other friends of__init__
and part of the machinery of object lifetime.
The following are language-specific and advanced features of the Python OOP model. The bad news is that there is a lot of nuance and subtlety in each of them which can be quite confusing when you are new to the ideas summarized in this document. The good news is that knowledge of them is only useful in very specific scenarios; that means you can safely ignore them for a while.
- In Python, everything is an object. And everything literally means (almost) everything. Classes, functions, and modules are all objects! This obviously has a lot of implications, a lot of which you are probably already using (e.g. having functions that return functions, which is what allows decorators to be possible).
- Under the hood of object lifetime:
__new__
,__init__
,__del__
and metaclasses. - Dynamic creation of new types and modification of existing ones (e.g.
monkeypatching) using the
types
module. - Reflection in python: dynamic inspection of objects (and by extension
modules, functions, classes, etc.) using the
inspect
module.