Skip to content

Instantly share code, notes, and snippets.

@wolfram77
wolfram77 / notes-cuda-by-example.md
Last active June 25, 2025 20:24
CUDA by Example : NOTES

Highlighted notes of:
CUDA by Example
An Introduction to General Purpose GPU Computing

Authors:
Jason Sanders
Edward Kandrot

“This book is required reading for anyone working with accelerator-based computing systems.”
–From the Foreword by Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory

@wolfram77
wolfram77 / notes-hybrid-multicore-computing.md
Last active June 25, 2025 20:24
Hybrid Multicore Computing : NOTES

Highlighted notes on:
Hybrid Multicore Computing.
While doing research work with Prof. Dip Sankar Banerjee, Prof. Kishore Kothapalli.

In this comprehensive report, Prof. Dip Sankar Banerjee describes about the benefit of utilizing both multicore systems, CPUs with vector instructions, and manycore systems, GPUs with large no. of low speed ALUs. Such hybrid systems are beneficial to several algorithms as an accelerator cant optimize for all parts of an algorithms (some computations are very regular, while some very irregular).

ORG

@wolfram77
wolfram77 / notes-practice-of-streaming-processing-of-dynamic-graphs-concepts-models-and-systems.md
Last active June 25, 2025 20:24
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems : NOTES

Highlighted notes on:
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems.
While doing research work with Prof. Dip Sankar Banerjee, Prof, Kishore Kothapalli.

This is a huge review paper discussing a lot about several graph streaming frameworks, and graph databases. How can i summarize this! GPU frameworks given are cuSTINGER, EvoGraph, Hornet, faimGraph, GPMA. Gap between databases and frameworks seems to be closing.

ORG

@wolfram77
wolfram77 / notes-hypr-hybrid-page-ranking-on-evolving-graphs.md
Last active June 25, 2025 20:23
HyPR: Hybrid Page Ranking on Evolving Graphs : NOTES

Highlighted notes on:
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in Large-Scale Dynamic Networks.
While doing research work with Prof. Dip Sankar Banerjee, Prof, Kishore Kothapalli.

In Hybrid Pagerank the vertices are divided in 3 groups, V_old, V_border, V_new. Scaling for old, border vertices is N/N_new, and 1/N_new for V_new (i do this too ). Then PR is run only on V_border, V_new.

"V_border which is the set of nodes which have edges in Bi connecting V_old and V_new and is reachable using a breadth first traversal." Does that mean V_border = V_batch(i) ∩ V_old? BFS from where?

"We can assume that the new batch of updates is topologically sorted since the PR scores of the new nodes in Bi is guaranteed to be lower than those in Co."

@wolfram77
wolfram77 / notes-a-parallel-algorithm-template-for-updating-singlesource-shortest-paths-in-largescale-dynamic-networks.md
Last active June 25, 2025 20:23
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in Large-Scale Dynamic Networks : NOTES

Highlighted notes on:
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in Large-Scale Dynamic Networks.
While doing research work with Prof. Dip Sankar Banerjee, Prof, Kishore Kothapalli.

For SSSP the researchers give an update algorithm for handling edge insertion and deletion. They implement for in OpenMP & CUDA and compare with Galois & Gunrock resp. For each vertex there are additional "affected" flags. In a later step "affected" flag is used iteratively update distances. To avoid loops disconnected vertices are set to INF. Edge deletions are slower (needs tree repair).

They have shown graphs for 50M, 100M changes, but i couldnt find what batch size they use. Is it 50/100M? Later they did mention experiment with batch size 15, 30, 50? Is it 50 changes of 50M changes?

ORG

@wolfram77
wolfram77 / notes-deeper-inside-pagerank.md
Last active June 25, 2025 20:23
Deeper Inside PageRank : NOTES

Highlighted notes on:
Deeper Inside PageRank.
While doing research work with Prof. Kishore Kothapalli.

This is a really "deep" review of PageRank! Should be a good story for a PhD student going to be working with PageRank optimizations.

ORG

@wolfram77
wolfram77 / notes-custinger-supporting-dynamic-graph-algorithms-for-gpus.md
Last active June 25, 2025 20:23
cuSTINGER: Supporting Dynamic Graph Algorithms for GPUs : NOTES

Highlighted notes on:
cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs.
While doing research work with Prof. Kishore Kothapalli.

These people wrote the first data structure for maintaining dynamic graphs with NVIDIA CUDA GPUs. Unlike STINGER which uses a block linked-list and GT-STINGER which uses Array of Structures (AoS), here they instead used Structure of Arrays (SoA) which is not only more suitable to GPU memory access, but also allows smaller allocations modes (memory is a premium in GPU).

They use a custom memory manager which speeds up memory management (instead of using system memory manager). The data structure used is the standard adjacency list (CSR) and pointers are updated when edge list has to grow. It could handle upto 10 million updates per second (large batch sizes), and ran a static graph algorithm (triangle counting) 1-10% slower. This shows that cuSTINGER can also be used with static graph algorithms. Dynamic graph algorithms should run much faster.

[![ORG](https://img.shields

@wolfram77
wolfram77 / notes-a-parallel-packed-memory-array-to-store-dynamic-graphs.md
Last active June 25, 2025 20:23
A Parallel Packed Memory Array to Store Dynamic Graphs : NOTES

Highlighted notes on:
A Parallel Packed Memory Array to Store Dynamic Graphs.
While doing research work with Prof. Kishore Kothapalli.

Here they provide proofs of a concurrently updateable data structure for dynamic graph (read this in FPGA paper).

It is an extension of CSR with free spaces in between so new edges can be added or removed. If there is no free space, edges can be redistributed within a tree-like hierarchy of blocks (leaves). If still no space, the size can be doubled. Looking up an edge still takes O(logN) like in CSR. They compare it to Aspen (dynamic graph streaming) and Ligra (static CSR) /Ligra+ (static compressed CSR) and find it to be close in performance to Ligra+ (Ligra slightly faster). Since PPCSR is not compressed and has free space, it occupies the most space. They use a (popcount based) lock order so that it is deadlock free.

ORG ![](https://ga-beacon.deno.dev/G-5WCD4S1JXR:yfiGW2m-TGi2

@wolfram77
wolfram77 / notes-an-improved-pagerank-algorithm-for-multilayer-networks.md
Last active June 25, 2025 20:23
An Improved PageRank Algorithm for Multilayer Networks : NOTES

Highlighted notes on:
An Improved PageRank Algorithm for Multilayer Networks.
While doing research work with Prof. Kishore Kothapalli.

The authors use a layer-layer weight to provide an additional factor to PR which they then use in the 1st step of SpreadMax algorithm to show that their modified pagerank is better than standard PR in selecting key nodes. I didnt properly understand this.

Applications of networks: Social networks, Air traffic systems, Transportation networks, Citation networks, Biological systems (infection spreading).

Multilayer networks can be thought of as a large graph subdivided into multiple groups (layers). This probably makes it easier to assign common weights to a particular group of nodes, edges.

@wolfram77
wolfram77 / notes-parallel-algorithms-for-multisource-graph-traversal-and-its-applications.md
Last active June 25, 2025 20:23
Parallel algorithms for multi-source graph traversal and its applications : NOTES

Highlighted notes on:
Parallel algorithms for multi-source graph traversal and its applications.
While doing research work with Prof. Kishore Kothapalli.

Seema is working on Multi-source BFS with hybrid-CSR, with applications in APSP, diameter, centrality, reachability.

BFS can be either top-down (from visited nodes, mark next frontier), or bottom-up (from unvisited nodes, mark next frontier). She mentioned that hybrid approach is more efficient. EtaGraph uses unified degree cut (UDC) graph partitioning. Also overlaps data transfer with kernel execution. iCENTRAL uses biconnected components for betwenness centrality on dynamic graphs.

Hybrid CSR uses an additional value array for storing packed "has edge/neighbour" bits. This can give better memory access pattern if many bits are set, and cause many threads to wait if many bits are zero. She mentioned Volta architecture has independent PC, stack per thread (similar to CPU?). Does is not matter then if the threads in a block diverge?