Notes from Week 9 Lecture 1 on Graphs
- Consists of at least one nodes/vertices and edges/arcs
- Each edge connects two nodes and are unique
- Types of other graphs (not tested)
- Multigraph – many edges between two nodes
- Hypergraph – edges connects >= 2 nodes
- Graph G = <V, E>
- V is a set of nodes
- |V| > 0
- E is a set of edges
- E = {(v, w) : (v in V) and (u in V)} – connects two nodes
- e = (v, w)
- For all e_1, e_2 in E, e_1 != e_2 – unique
- V is a set of nodes
- Every pair of nodes is connected by a path
- A graph is Disconnected if a pair of nodes are not connected by a path
- Disconnected graphs have k disconnected components (ie. parts)
- Degree of a node is the number of adjacent edges (ie. number of incoming edges at a node)
- Degree of a graph is the maximum degree of a node
- Maximum distance between two nodes, following the shortest path (maximum minimum path)
- A Simple Line has no loops/cycles
- A Simple Cycle has only only cycle embedded inside it
- Star
- One central, all edges connect center to edges
- Degree:
n-1(withnnodes) - Diameter:
2(go from start node to center to target node)
- Clique
- Complete graph; all pairs of nodes connected by edges
- Degree:
n-1(by hand-shaking Lemma) - Diameter:
1
- Line
- Degree:
n-1 - Diameter:
2
- Degree:
- Cycle
- Every node connected to its adjacent nodes (forms a wheel)
- Degree:
n-1 - Diameter:
n/2 - 1(oddn) orn/2(evenn) - Bipartite if alternating nodes are in different sets
- Even cycles only; odd cycles are not
- Bipartite
- Nodes divided into two sets
- Some nodes in two sets don't need to be connected with an edge
- A line path can be formed between nodes in the two sets (like a zigzag)
- No edges between nodes in the same set
- Diameter:
n-1(if it forms a zigzag between the two sets) - Graph coloring allows us to see if a graph is bipartite (2 color problem)
- Nodes divided into two sets
Real life graphs can be found in social networks where nodes are people and edges are relationships. Fun questions to ask are whether the graph is connected, what is the diameter (is there a path between 2 friends), and what is the degree (how many friends).
Transportation lines (MRT line) are also graphs where nodes are stations and edges are the train lines.
Internet/Computer networks are also graphs where different topologies represent different graph structures.
Communication satellite networks are also graphs.
- Graph coloring problem (4 color theorem)
- Sliding grid puzzles like 2048
- Different states are nodes and edges are state transitions; not the puzzle tiles themselves
- All permutations of the state form all the nodes
- Degree of this graph would be
4since we can make 4 moves (Von Neumann Neighbourhood)- #nodes =
n! - #edges =
4 * n!
- #nodes =
- Questions: minimum number of moves needed to solve puzzle <= graph diameter
- Very high branching factor
- 2x2x2 Rubik's Cube
- Nodes are each possible state of the cube faces
- Edge is a basic move (90 degree turn, 180 degree turn)
- Goal is to find the minimum number of moves to solve it
- Very high branching factor
- #nodes =
n! * 3^nsince there are 3 basic turn moves you can make withnfaces- If we fix one of the nodes and look at the other
n-1, it becomes(n-1)! * 3^(n-1)
- If we fix one of the nodes and look at the other
- #nodes =
- 3x3x3 Rubik's Cube
- 43 quintillion nodes
- Diameter:
20 - Very high branching factor
- For a
nxnxncube, the diameter isTheta(n^2 / logn)
Graphs are made of nodes and edges that need to be stored
- Nodes can be stored in an array
- Edges are pairs of nodes
n^2-sized binary matrix where0indicates no edge and1indicates edgeA[v][w] == 1iff (v, w) in EA^nrepresents then pathsof the graph- neat way of figuring out connectivity (whether it's connected or not, etc.)
- neat was to figure out diameter
- not always the most efficient
- parallelizes well
A ^ infinityis what Google Pagerank is doing
- Nodes can be stored in an array
- Edges can be stored as linked list per node
a –> e -> f
b -> c -> d -> e
c -> b
d -> b
e -> b -> a
f -> a
Node-Edge pairs will appear twice since (u, v) is (v u) in the respective linked lists.
List
For a graph G = <V, E>, array is size |V| and linked lists are |E|. Total would be O(V + E). For a cycle, it's O(V)
- Fast query: find any neighbour ->
O(1) - Fast query: enumerate all neighbours for
v->O(# nbours) - Slow(er) query: are
vandwneighbours
Matrix
For a graph G = <V, E>, array is of size |V| * |V|. Total would be O(V^2). For a cycle, it's O(V^2)
- Fast query: are
vandwneighbours? - Slow query: find any neighour of
v - Slow query: enumerate all neighbours for
v->O(V)
Remarks
- For a cycle, an adjacency list is better with cost
O(V) - For a clique, either is good since we're taking all the
O(V^2)space for all edges between all nodes - Space usage: is graph dense or sparse?
- Queries: what type of queries do I need?
- Enumerate neighbours of a node?
- Query relationships between nodes?
If a graph is dense with |E| =
Theta(V^2), a matrix is better. For Facebook, to answer if 2 people are friends, a matrix is faster but list takes up lesser space. If question is to list all friends, a list is better since we can look at the linked list (we'd have to iterate through a whole row in a matrix).
Goal is to start at a node and finish at another node. Brute force is to visit all the nodes to find a path. There are two techniques: Breadth-First Search (BFS) and Depth-First Search (DFS)
- Explore level by level
- Frontier: current level
- Initially: {
s} wheresis the start node - Advance frontier
- Don't go backward
- Build levels
- Calculate
level[i]fromlevel[i-1] - Skkip already visited nodes
BFS(nodes, startID):
boolean[] visited = new boolean[nodes.length] // have we visited node or not?
Arrays.fill(visited, false)
int[] parent = new int[nodes.length] // keep track of all parent nodes
Arrays.fill(parent, -1)
Collection<Integer> frontier = new Collection<Integer>
frontier.add(startID) // start search from here
while (!fronttier.isEmpty()):
// a way to store the next level
Collection<Integer> next Frontier = new ...
for (Integer v : frontier):
for (Integer w : nodes[v].neighbours):
if (!visited[w]):
visited[w] = true // seen already
parent[w] = v // store parent
nextFrontier.add(w)
// move on to next level
frontier = nextFrontier
- BFS fails to visit every node in a graph with two disconnected components
- Since there is no edge between two components, it can't be visited
- Runtime of BFS is
O(V + E)– all the edges are explored for each node visited- Every node acts as the start node
O(V) - Evert node is added to
nextFrontierandfrontieronceO(V)- After visited, never re-added
- Every edge is looked at from each side
O(E)
- Every node acts as the start node
- Parent pointers store the shortest path (a spanning tree, or sorts)
- Shortest path graph is a Tree (since it has
n-1edges andnnodes) - Shortest path graph has no cycles
- Shortest path does not mean low-degree or low-diameter (ie. not small)
- Could have high degree and possible high diameter
- Shortest path graph is a Tree (since it has
- The complexity of BFS implemented using an Adjacency Matrix will be
O(|V|²) - When implemented by an Adjacency List, it's
O(|V| + |E|)