This course is at an undergraduate level, likely situated in third or fourth year. Students should feel programming language concepts, including recursion, as well as proof techniques, including induction.
- Why study algorithms, designer's mantra -- CAN WE DO BETTER
- Karatsuba Multiplication
- Merge Sort, analysis generalizes to Master Method
- Merge Sort pseudo code -- recursively sort one half, then second half, finally merge the two sorted halves
- Merge Sort running time -- 6n * log(n) + 6n
- Provides vocabulary for the design and analysis of algorithms
- Coarse enough to suppress details, sharp enough to make useful comparisons between different algorithms
- Suppress constant factors and lower-order terms
- Big-Oh notation -- T(n) = O(f(n)) if for all sufficiently large n, T(n) is bounded above by a constant multiple of f(n)
- Claim: O(nk) is not O(n(k-1))
- Omega notation -- T(n) = OM(f(n)) if there exist constant c, n0 such that T(n) >= c * f(n) for every n > n0
- Theta notation -- T(n) = O(f(n)) and T(n) = OM(f(n))
- Theta notation -- there exist constants c1, c2, n0 such that c1*f(n) <= T(n) >= c2 * f(n) for every n > n0
- Little-Oh notation -- T(n) = o(f(n)) if for all constants c>0, there exists a constant n0 such that T(n) <= c * f(n) for every n > n0
- Counting inversions
- Motivation -- numerical similarity measure between two ranked lists, e.g. collaborative filtering
- Brute force is Theta(n**2), we can do better with Divide + Conquer, piggyback on merge sort
- Counting the halves is trivial, but CountSplitInv in linear time is the challenge
- Strassen algorithm for matrix multiplication
- recursively compute only 7 (cleverly chosen) products
- do the necessary (clever) additions + substractions
- still Theta(n**2) time, but better than cubic time
- Closest Pair Problem
- input: a set P = { p1, ... , pn) of n points in the plane R**2
- output: a pair (p,q) of distinct points that minimize the Euclidean distance over all (pi,qi) in the set P
- Consider the problem in R (linear space) -- O(n**2) but sorting brings that down to O(n)
- Make copies of points sorted by x and y coordinates, but that is not enough
- ClosestSplitPair(Px, Py, delta), ... , p and q are at most 7 positions apart
- Simple formulae for calculating Big-O time of algorithms; a black box for solving recurrences
- Base case: T(n) <= a*T(n/b) + O(n**d), where a is number of recursive calls, b is input size shrinkage factor, d is exponent in running time of "combine step"
- T(n) depends on the ratio of a and b**d
- Examples: merge sort, binary search, integer multiplication, Strassen's matrix multiplication algorithm
- Proof by induction -- base case and inductive step
- O(n*log(n)) time on average, works in place
- Partitioning around a pivot, puts the pivot in its "rightful position, runs in linear time, reduces the problem size
- Proof of correctness by induction
- Important to choose good pivot, worst case is O(n**2)
- Random pivots -- if always get a 25-75 split, good enough for O(n*log(n))
- Can not apply Master Method -- random pivot, unbalanced subproblems
- Key Random Variable -- # of comparisons between two input elements made by QuickSort
- A Decomposition Principle -- 1) Identify random variable Y that you really care about, 2) Express Y as sum of indicator random variables, 3) Apply Linearity of expectation
- Sample Spaces Omega -- all possible outcomes, Sum(p(i)) = 1
- Events -- a subset S of Omega
- Random Variables -- X is a real-valued function X: Omega -> R
- Expectation E[X] of X -- average value of X = Sum(X(i) * p(i))
- Linearity of Expectation -- E[ Sum(Xj) ] = Sum( E[Xj] )
- Conditional Probability -- Pr[ X|Y ] = Pr[ X and Y ] / Pr[Y]
- Independence of Events -- iff Pr[ X and Y ] = Pr[X] * Pr[Y]
- Independence of Random Variables -- Pr[ A == a and B == b ] = Pr[ A == a ] * Pr[ B == b ]
- Randomized Selection Algorithm, reduction to sorting (qsort) leads to O(n*log(n))
- Partitioning around a pivot, such that left of pivot are "less than pivot", right of pivot ...
- Randomized selection is correct, running time depends on "quality" of pivot, worst is O(n**2), average is O(n) !
- Deterministic Linear-Time Selection -- the key is for ChoosePivot() to return pretty good pivot very quickly
- Use "median of medians" with out-of-place sorting, makes 2 recursive calls
- Running time is O(n) but the constants are higher so it is not as good as Random Selection
- Every "comparison-based" sorting has lower bound of O(n*log(n)) -- merge sort, qsort, heap sort
- When making string assumptions about the sorted data -- Bucket Sort, Counting Sort, Radix Sort
- Graphs, vertices, edges, dense and sparse graphs, matrix and adjacency list representations
- Minimum Cut Problem is to find a cut with fewest number of crossing edges
- Random Contraction Algorithm, what is the probability of success, what could go wrong
- Low success probability - 1 / n**2, do repeated trials
- Counting Minimum Cuts
- Generic Graph Search -- find everything "findable"
- Breadth-First Search (BFS) -- explore nodes in "layers", can compute shortest paths, finds connected components of undirected graphs, run-time O(n+m)
- Depth-First Search (DFS) -- explore aggressively, only backtrack when necessary, run-time is O(n+m), computes:
- a topological ordering of a directed acyclic graph
- strongly connected components of directed graphs
- Topological Sort -- to sequence tasks while respecting all precedence constraints, if there is no directed cycle can compute topological ordering in O(n+m) time
- Two solutions for Topological Sort -- straightforward and via DFS
- Strongly Connected Components -- of a directed graph G are the equivalence classes of the following relation: u <-> v iff there is u -> v and v -> u
- Kosaraju algorithm -- two passed of DFS, plus some clever additional book-keeping
- Applications of SCC to web -- all 4 parts: one giant SCC, IN, OUT, tubes + tendrils, have roughly the same size, within CORE is very well connected, outside is surprisingly poorly connected
- Single-source shortest path, when each edge may have different cost, BFS "assumes" when all edges have equal cost
- Works only if path cost is non-negative
- Proof of correctness by induction on the number of iterations
- Naive implementation has runtime O(nm), but by using heap data structure it goes down to O(mlog(n))
- Canonical use -- fast way to do repeated minimum, or maximum but not both, computations
- Turns Selection Sort O(n**2) into Heap Sort with O(n*log(n))
- Supported operations -- Insert O(log(n)), Extract-Min O(log(n)), Heapify O(?), Delete O(log(n))
- Another name is Priority Queue
- Applications
- 2-SUM tasks, with alternative solitions 1, 2, 3
- Median maintenance
- Dijkstra's shortest-path algorithm
- Better than sorted array, albeit a bit slower, because they support faster insertions and deletions
- Search trees can degenerate into linked list, affecting the performance of operations, O(tree-height)
- Red-black trees, also AVL-trees, splay-trees, B-trees
- Red-black invariants:
- Each node is red or black
- Root node is black
- No two red nodes in a row
- Every root-NULL path must have the same number of blacks nodes
- Every red-black tree has Height <= 2 * log(n+1)
- Key primitives -- left rotation and right rotation
- Insertions and deletions, not quite simple operations
- Purpose -- fast O(1) operations
- Supported operations
- Applications -- de-duplication, determine of the sum of 2 integers is in the hash table, symbol tables in compilers
- Naive implementation -- array, list
- Birthday paradox
- Resolving conflicts -- chaining, open addressing (linear probing or double-hashing)
- What is bad hash function; what makes a good hash function
- Should "spread data out"
- Should be easy to evaluate
- How to choose the number of buckets -- prime number, not too close to power of 2, not too close to power of 10
- The load of the hash table = # of objects / # of buckets, for good performance must control the load factor
- Hash function with good performance for every data set does not exist; pathological data set
- Solutions
- Use cryptographic hash function
- Use randomization
- Universal Hash Function -- a set of hash functions H: U -> { 0,1, ..., n } such that for any x,y in U: Pr[ x,y collide } <= 1/n, when h is chosen uniformly at random from H
- Collision resolution with chaining gives constant-time guarantee -- Carter-Wegman 1979 Theorem
- Performance of the collision resolution with open address depends on the load factor
- Pros -- fast inserts and lookups, space efficient, no false negatives
- Cons -- can't store the object, no deletions, small probability for false positives
- Applications -- spellcheckers, list of forbidden passwords, internet routing
- Under the hood -- array of n bits, k hash functions
- Trade-off between space and error, to minimize the error increase the space