This is my report on:
Adjusting CSR format for graph (v1).
While doing research work with Prof. Dip Banerjee, Prof. Kishore Kothapalli.
Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is commonly used for efficient graph computations. However, given N vertices, M edges, and a 32-bit / 4-byte vertex-id, it occupies a space of 4(N + M) bytes. Note however that a 32-bit unsigned integer is limited to just 4 billion ids, and thus massive graphs would need to use a 64-bit / 8-byte vertex-id. This further raises the occupied space to 8(N +M) bytes. Since large memories are difficult to make and tend to be slower than smaller ones (?) it makes sense to try to reduce this space requirement. Hybrid CSR is a graph representation that combines the idea behind adjacency-list and adjacency-matrix (bfs-seema), with its edge-lists being similar to roaring bitmaps (lemire). Unlike CSR, which stores a list of indices of destination vertices for each vertex, hybrid CSR uses smaller indices, each combined with a dense bitset. This allows it to represent dense regions of a graph in a compact form.