title | author | date |
---|---|---|
ELI5: Big O |
Sarah Lim |
24 April 2016 (edited 23 April 2017) |
To measure an algorithm's efficiency, one intuitive idea is to consider the amount of time it takes to solve a problem (transform the input into the desired output).
Unfortunately, it's difficult to measure the exact runtime of an algorithm. In practice, runtimes are influenced by a lot of factors unrelated to the algorithm itself. For instance, the same function might run quickly on a state-of-the-art gaming PC, and much more slowly on an ancient Windows 98 machine.
So instead of obsessing over nanosecond differences in runtime, many computer scientists like to use asymptotic analysis to classify algorithms according to their approximate efficiency.
Asymptotic analysis investigates how an algorithm's performance degrades as the input size grows. Generally speaking, we are interested in two performance factors:
- Time complexity: as the input grows infinitely large, how much longer does it take to solve the problem?
- Space complexity: as the input grows infinitely large, how much more memory does it take to solve the problem?
Let's consider the list searching example, where we have a list of numbers and we want to return a boolean indicating whether or not the number is in the list. We could write the following pseudocode algorithm:
function search(target, list):
for each item in list:
if item == target then return true
return false
This algorithm starts at the beginning of the list and visits each item until we've either found our target, or gone through the entire list. In the best case scenario, our target is the first item of the list -- then the loop only runs once, and we're done!
In general, though, the best case scenario is not very interesting. When we compare algorithms, we're interested in how they perform when the rubber meets the road -- that is, complexity in the worst case scenario.
For our example, the worst case scenario is when our list doesn't contain our target element at all. When this happens, our algorithm can't just find a match and terminate early. It has to check the entire list before calling it quits.
Note: When we say "best case" and "worst case," we mean "the best case in terms of our algorithm's performance," not the best case in terms of external meaning we've ascribed to the results.
In the previous example, our worst case isn't the worst case because we didn't find what we were looking for -- it's the worst case because the algorithm has to go through the entire list instead of being able to terminate early. Likewise, our best case isn't the best because we found our target; it's the best because we only had to run the loop once.
An algorithm's performance (e.g. how long it takes to finish, or how much memory it uses) is totally separate from any subjective valuation we might attach to its output (e.g. "found = good and not found = bad!!!!!1"). For example, suppose we are searching a DNA sequence to determine whether a mutation exists. In the Real World, the ideal output is a result of "not found," but in Algorithm World, this is the worst case performance scenario. Conversely, finding the mutation at the very beginning of the patient's DNA sequence is the "best case" from a performance standpoint, even though it would be a bad outcome in real life.
In short, "best" and "worst case" describe an algorithm's inputs, not its outputs.
Now we've established that in the worst case, our algorithm has to go through the entire input list. Intuitively, it takes longer to go through a long list than a short one. But roughly how much longer? This is the time complexity of our algorithm, and we use something called asymptotic notation to describe this value.
Recall the worst case input requires our algorithm to visit each list item exactly once:
function search(list, target):
for each item in list:
if item == target then return true
return false
Suppose our input list contains
This is an example of big O notation, which is the most common way computer scientists talk about algorithmic efficiency. Big O notation expresses an upper bound on a function. In particular, it expresses an upper bound in terms of another function that will always be greater than or equal to the first function, for any given value in both functions' domains.
When we analyze algorithms, we are interested in
Note: Big O expresses relationships between different functions of some variable, as that variable grows infinitely large. In other words, it compares the asymptotic behavior of functions.
There are a lot of ways to talk about big O mathematically. For two functions
$f(n)$ and$g(n)$ , the following expressions are more or less equivalent:
$f(n)$ is$O \big( g(n) \big)$ , or$f(n) = O \big( g(n) \big)$ , or$f$ is$O(g)$ $f$ grows (slower than/no faster than)$g$ $f$ is upper bounded by$g$ , or$g$ is an upper bound on$f$ $f(n) \leq kg(n)$ , for some constant factor$k$ - as
$n$ approaches infinity, the ratio of$f$ to$g$ is finite$\lim_{x\to\infty} \Big| \frac{f(x)}{g(x)} \Big| < \infty$
It turns out that
Such constant $T(n)$s do exist, but are typically very simple. Consider the following function, which just adds 3 to a number:
function add3(n):
return n + 3
No matter how large add3
takes the same amount of time to complete -- in other words, its runtime add3
is constant time, or
Now we've seen two common complexity classes:
Notation | Complexity class | Judgment |
---|---|---|
Constant | Awesome | |
Logarithmic | Good | |
Linear | Decent | |
Quadratic | Bad | |
Exponential | Awful |
Big O is all about classifying functions into groups like these. To understand big O in a nutshell, let's momentarily throw back to plotting graphs of linear and quadratic equations in middle school. Here is a graph of the functions
This graph makes it easy to see that for all values of
Compare this behavior to the difference between linear functions
Here's the idea in a nutshell: "Let's divide all the functions, ever, into classes that describe approximately how fast a function grows, in terms of the closest general upper bound.
We can get a pretty good idea of how big O complexity works by comparing these growth rates for very large values of
Namely, notice how each function grows successively faster than its predecessor. Remember that we are using big O to measure the growth of our algorithm's running time
- the faster
$T(n)$ grows, - the longer our algorithm takes, and
- the worse it performs.
Look at the above graph again. We've already observed that each function grows successively faster than its predecessor -- the linear function
We can actually make an even stronger statement. Each successive function grows faster than not only its predecessor, but all preceding functions. That is,
as
Recall that big O notation expresses some function as the upper bound of another. (For instance, saying
If we say that a function
Now, suppose this bound
Chaining the two statements together, we can write
Now drop the
Translating this into big O notation,
This seemingly trivial example illustrates an important property: a function
For instance, if
The following image neatly summarizes the relationship between complexity classes of functions.
Although big O is the most common notation used for comparing functions, there are two others worth knowing.
The "opposite" of big O is big
-
$f$ is$\Omega(g)$ , pronounced "$f$ is omega of$g$ " -
$f$ grows at least as quickly as$g$ -
$g$ is a lower bound on$f$ -
$f(n) \geq kg(n)$ , for some constant$k$
For example, we know that
Finally, big
For example, saying "$T(n)$ is
which means that as
Usually these constants are different, and the two versions of
Note: When we talk about algorithm performance, it's easy to confuse "bound terminology" (upper bound
$O$ , lower bound$\Omega$ , and tight bound$\Theta$ ) with "case terminology" (best case, worst case, and average case).
Recall that we use "(best/worst/average) case" to describe the features of an algorithm's input(s), and the corresponding impact upon the algorithm's performance. Each "case" produces a different
$T(n)$ for some algorithm, depending on how favorable the inputs are.By contrast, we use asymptotic notation and "(upper/lower/tight) bounds" to describe functions in terms of other functions. Here, we already know the algorithm's performance
$T(n)$ , but we don't care how we got this function; we just want to classify its growth rate in relation to other functions.How do these ideas relate? If we let
$T(n)$ denote the time complexity/performance of an algorithm, then the "best case" input minimizes$T(n)$ -- which minimizes the upper, lower, and tight bounds on$T(n)$ . Conversely, the "worst case" input maximizes$T(n)$ , which maximizes all three bounds.As a slightly more concrete example, consider our original searching algorithm:
- Best case: our target is the first item in the list. Then no matter how large the list, the algorithm has the same runtime, because we only need to check the first item. So
$T(n) = O(1)$ .
- But we could also write
$T(n) = \Theta(1)$ or$T(n) = \Omega(1)$ , because asymptotic notation just describes the function$T(n)$ in terms of other functions.
- Worst case: our target isn't in the list. Then the algorithm's runtime grows proportionally to the size of the list. So
$T(n) = O(n)$ .
- But we could also write
$T(n) = \Omega(1)$ , because$T$ grows faster than any constant-valued function -- in other words, the constant function is a lower bound on$T(n)$ .