Created
January 23, 2018 08:10
-
-
Save avnerbarr/82fb20d25c9b040f2caca8cdcefe7fc2 to your computer and use it in GitHub Desktop.
64bit "murmur" hash in scala. The builtin murmur hash only outputs 32 bits so if hashing millions of items you will get quite a few conflicts. This method will allow hashing millions of items without any conflicts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Simple function to generate 64 bit hashes using the builtin Murmur hash functionaliy which only outputs 32bits | |
// we had cases when hashing 15M+ items where we got around 0.08% conflicts | |
// using this method we were able to hash 15M items with 0 conflicts | |
object ExtendedMurmurHash { | |
val seed = 0xf7ca7fd2 | |
def hash(u: String): Long = { | |
val a = scala.util.hashing.MurmurHash3.stringHash(u, seed) | |
val b = scala.util.hashing.MurmurHash3.stringHash(u.reverse.toString, seed) | |
// shift a 32 bits to the left, leaving 32 lower bits zeroed | |
// mask b with "1111...1" (32 bits) | |
// bitwise "OR" between both leaving a 64 bit hash of the string using Murmur32 on each portion | |
val c: Long = a.toLong << 32 | (b & 0xffffffffL) | |
c | |
} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment