Fast and Efficient End-to-End Graph Processing with Shared Memory Accelerators; Mughrabi (2021)

Graph algorithms often require fine-grained, random access across substantially large data structures. Previous work on FPGA-based acceleration has required significant preprocessing and restructuring to transform the memory access patterns into a streaming format that is more friendly to off-chip hardware. However, the emergence of cache-coherent shared memory interfaces, such as Coherent Accelerator Processor Interface (CAPI), allows designers to more easily work with the natural in-memory organization of the data. This thesis introduces a vertex-centric shared-memory accelerator (AccelGraph) for graph algorithms optimized for high performance while effectively using coherent caching on the Field Programmable Gate Arrays (FPGA) hardware. The proposed design achieves speedups by selectively caching graph data for the accelerator while considering locality and reuse, compared to using the shared address space access and DRAM only. We also introduce PageRank Quantization, an innovative technique to represent page-ranks with 32 bit quantized fixed-point values. This approach improves performance than a 64 bit fixed-point representation while keeping precision within a tolerable error margin. As a result, we maintain both the hardware scalability of fixed-point representation and the cache performance of 32 bit floating-point.

wolfram77/notes-fast-and-efficient-end-to-end-graph-processing-with-shared-memory-accelerators.md

Fast and Efficient End-to-End Graph Processing with Shared Memory Accelerators; Mughrabi (2021)