I was trying out the Misra-Gries algorithm for finding heavy hitters in a list of numbers, using vector instructions. But let me fill in some context first. I am trying to minimize the memory needed by Louvain/Leiden algorithms for community detection, and hopefully a bit of performance too. Currently the algorithms use a full-size per-thread hashtables, with each thread using |V|
space for storing the associated weights for the hashtable. However, we could instead store a small hashtable, using the Misra-Gries algorithm, in the cache - obviously this might affect the performance (in terms of quality of the returned communities).
It was cool to go through step-by-step and be able to minimize the number of lines of generated machine code (with minimal conditional jumps). When you understand the vector instructions, you can write them in a higher-level logic without resorting to writing the instructions yourself, as these can be quite complicated. We let the compiler do its thing, but guide it with a short for-loops so it can actually do it. Note, setting flags with a -1
is more efficient due to how vector instructions work.
The code, with the generated instructions on x86 architecture is attached below. Compiler flags as -O3
and -mavx
. May the vectors be with you!
Date: 27th October 2024