Skip to content

Instantly share code, notes, and snippets.

@tomas-wood
Last active November 13, 2020 17:11
Show Gist options
  • Select an option

  • Save tomas-wood/dd0ef8d8b2c9b7445e5533012dd2c462 to your computer and use it in GitHub Desktop.

Select an option

Save tomas-wood/dd0ef8d8b2c9b7445e5533012dd2c462 to your computer and use it in GitHub Desktop.
partition graph for mini-batch gcn

Partitioning Graph for Mini-Batch Approximate Graph Convolutional Networks (GCNs)

So we want to be able to create partitions of the graph which we can perform graph convolutions on without having to load the entire dataset into memory.

Terms: A, the adjacency matrix of the graph (normalized and with identity matrix added to it, but still A). With N nodes in the graph, A is N X N.
X, a dense matrix of features such that each feature is of dimension F, then for all N nodes we get a dense matrix of size N X F

In the GCN paper and code, the entire dataset and the entire adjacency matrix are loaded into memory. The weight updates are done by looking at every node in the graph and their edges before the weight update is done. We don't necessarily want to do this, as the number of nodes in the graph and/or the size of the feature space in the graph might be very large. Think images associated with nodes in a huge graph with 100 million nodes. We're not going to be able to fit that whole feature matrix into memory, so what do we do?

We partition the graph into either A) random subgraphs with sampling or even perhaps subgraphs that result from gremlin queries (oh yeah this is definitely going to be a thing) or B) seperate components using eigenvalue analysis to select a subset of nodes that we want to use as a batch in learning or performing inference with our GCN.

The problem is, we want to reduce the size of X, but we don't know how many of the nodes in our subset are actually connected to nodes outside our partition. So we could go through and ask for the neighborhood of each of the nodes in the partition and any nodes that aren't inside the partition would be added. However for densely connected graphs, this would be problematic since in the extreme example of a clique, one single add_neighbors() step could add every single node in our network back into the partition. The other issue is that we are now approximating the connections to the nodes we just added to the network. We don't have the fully topology of the nodes not originally in the partition, just like we didn't have the full topology of the nodes we started with. So we have to draw the line at where we are comfortable with the approximation. Might as well start with only considering the subgraph and the edges between the nodes inside the subgraph. We can add neighbors and compare once we have a few tests set up so we can examine which approach is more effective in practice.

If we use connected components of the graph, it will solve the issue of making strong, inaccurate assumptions in our approximation, but we are still susceptible to corner cases. And we assume the graph will have easily recovered components. In many cases, such as learning an embedding, the number of clusters/components in the graph is the problem we are trying to solve. Also we'd have to solve eig(A), which might be a very large matrix that could be quite poorly conditioned. Another drawback to this approach is that we wouldn't be able to use gremlin queries to select which subgraphs we want to feed into the algorithm, one of the main features we want to add with a graphdb data layer.

So the algorithm steps are as follows:

  1. Use either random sampling or a gremlin query to select a subgraph, G'(V',E'), of the full network G(V,E) such that len(V) = N and len(V') = M where M < N.
  2. Create an index mapping that maps between the indices in the smaller adjacency matrix A' and feature set X'. In this way we can use our indices in our new problem on the subgraph G' to map back onto indices in the full graph G.
  3. Perform graph convolution on the subgraph, neglecting edges to and from the subgraph to nodes outside the subgraph. The first layer in this GCN can be expressed as f(X) = σ ( A' X' W ) where A' and X' are M X M and M X F matrices respectively.
  4. Update weights of GCN filters (W^0, W^1,...,W^l) with standard Θ =: Θ - α dJ / d Θ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment