So we want to be able to create partitions of the graph which we can perform graph convolutions on without having to load the entire dataset into memory.
Terms:
A, the adjacency matrix of the graph (normalized and with identity matrix added to it, but still A). With N nodes in the graph, A is N X N.
X, a dense matrix of features such that each feature is of dimension F, then for all N nodes we get a dense matrix of size N X F
In the GCN paper and code, the entire dataset and the entire adjacency matrix are loaded into memory. The weight updates are done by