Skip to content

Instantly share code, notes, and snippets.

@attractivechaos
Last active January 13, 2026 15:22
Show Gist options
  • Select an option

  • Save attractivechaos/d2efc77cc1db56bbd5fc597987e73338 to your computer and use it in GitHub Desktop.

Select an option

Save attractivechaos/d2efc77cc1db56bbd5fc597987e73338 to your computer and use it in GitHub Desktop.

Other posts

Lessons from Hash Table Merging

Merging two hash maps seems like an O(N) operation. However, while merging millions of keys, I encountered a massive >10x performance degradation unexpectedly. This post explores why some of the most popular libraries fall into this trap and how to fix it. The source code is available here.

The set up

I generated 3N random numbers with splitmix64, where N ranges from 7 to 31 million. I put the first N numbers in a hash map and the rest 2N numbers in a second hash map. Then I merged the second map into the first map and recorded timing on M4 Pro. I used a decent hash function copied from a blog post by Chris Wellons:

static inline uint64_t hash64(uint64_t x) {
    x = (x ^ (x>>32)) * 0xd6e8feb86659fd93ULL;
    x = (x ^ (x>>32)) * 0xd6e8feb86659fd93ULL;
    return x ^ (x>>32);
}

I evaluated the following hash table libraries, all based on linear probing.

Library Version Language Algorithm
Abseil 20250814.1 C++ Swiss Table
Boost 1.90.0 C++ Swiss Table
unordered_dense 4.8.1 C++ Robin Hood hashing
khashl r40 C Basic linear probing
Rust standard 1.92.0 Rust Swiss Table
hashbrown 0.16.1 Rust Swiss Table

Merging hash tables may be slow

The plot on the left shows the time for inserting 2N keys to an empty hash map, and the plot on right shows the time for merging a map with 2N keys into another map with N keys.

image

Notably, Abseil and Boost are >20X slower on hash table merging than hash table creation, while unordered_dense and khashl are not affected (I will come to them later). To understand the cause, suppose we have hash map h0 and h1 of the same size and we merge h1 into h0 with:

for (auto const &iter : h1)
    h0[iter.first] += iter.second;

With a typical iterator implementation, we traverse keys in table h1 in the order of their hash codes. Because h0 and h1 use the same hash function (also known as "hasher"), the original bucket position of a key in h1 is identical to the position in h0. With the loop above, we will populate the beginning of h0 first. That part of h0 is almost fully saturated although on average, h0 has not reached the targeted load factor. Such primary clustering leads to the poor performance. All Swiss Table-based implementations, including the Rust libraries in the table, suffer from this issue if a fixed hash function is naively used.

Efficient hash table merging

Solution I: salted hash function

The figure above uses my own hasher. Using Abseil's default hasher wouldn't have this problem. Internally, Abseil adds a random 16-bit seed to each hash table instance and mixes this seed with hash codes. As a result, the same key is placed differently in h0 and h1 above. The problem described above wouldn't exist.

When you have to use your own hasher for specialized data type, you won't benefit from Abseil's builtin hash functions, but you can wrap your hasher with a class like this:

class RandHasher {
    size_t seed;
public:
    RandHasher(void) {
        std::random_device rd;
        seed = std::uniform_int_distribution<size_t>{}(rd);
    }
    size_t operator()(const uint64_t x) const {
        return hash64(x ^ seed); // FIXME: for generic keys, mix hash(x) and seed
    }
};

The plot on the left shows the timing with a salted hasher. You can see hash table merging is as fast as table creation now. Salted hasher was introduced to alleviate hash flooding. I don't know if the original designer has table merging in mind, but it works well anyway.

image

Solution II: preallocation

The second strategy is to reserve enough space to hold both h0 and h1. Although h0 will not be populated randomly when merging h1 into h0, primary clustering wouldn't occur due to sufficient empty buckets. This solution is more cache friendly and faster in practice than salted hasher (plot on the right). The downside is that when h0 and h1 have many duplicates, preallocation may waste memory.

Solution III: stride iteration

unordered_dense is efficient on hash table merging because its iterator traverses keys in the input order, not in the hash order. With a random input order from splitmix64, h0 will be populated randomly during merging without primary clustering. Inspired by this observation, I modified the khashl iteration loop to (conceptually):

for (size_t i = 0, k1 = 0; i < h1.n_buckets; ++i) {
    if (h1.bucket[k1].occupied)
        h0.insert(h1.bucket[k1]);
    k1 = (k1 + 7) % h1.n_buckets; // instead of k1 = (k1 + 1) % h1.n_buckets
}

The loop guarantees to visit each bucket in h1 exactly once as long as the step size (7 here) and h1.n_buckets share no common factor. As is shown in the first plot on the right, this strategy can merge hash tables efficiently even if we use a fixed hasher. With better data locality, it is also faster than salted hasher. It is also possible to use other types of iteration such as quadratic iteration.

Other contenders

In comparison to linear probing, quadratic probing is more robust to non-random input order. This improves the performance but not as much as the other solutions above. I also tried to shard the bigger hash table into smaller subtables. It helps but does not solve the root problem. It is not recommended, either.

Conclusion

Strategy Merging performance Complexity Trade-offs
Salted hasher Good Medium Minor overhead per hash
Preallocaion Best Low Higher memory floor
Stride iteration Better High Requiring library-level changes

Merging hash tables can be very slow when you use a wrong library or a wrong hash function. Among the three solutions discussed above, preallocation (Solution II) might be the easiest to implement and can be the fastest if h0 and h1 have little overlap (plot below). Salted hasher (Solution I) is slower for merging, but this is not a concern for most other operations. Salted hasher also has other benefit and comes by default with Abseil. Stride iteration (Solution III) is a balanced strategy, but it is hard to implement in the user space. There is not a clear winner in my opinion.

image

Appendix

Widely considered as the fastest hash table in C++, Boost is more than twice as fast as Abseil (plot above), which Swiss Table was originated from. Rust with my own fixed hasher is as fast as Boost. Nonetheless, to be more robust to hash flooding, Rust uses SipHash by default which is several times slower. When performance matters, use your own hash functions.

PS: someone pointed out on Hacker News that rust hash tables provide the extend() API which can be used to merge hash tables. This API is similar Solution II, but it does not always reserve enough space to hold the merged hash table. As a result, primary clustering may still happen if you use a fixed hasher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment