Created
May 30, 2016 16:23
-
-
Save imankulov/064b89b3e36cf1cdc29b9b5b7bb65f08 to your computer and use it in GitHub Desktop.
Jump hash with linear probing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Jump hash with linear probing | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
JumpHash(N) maps consistently integer values onto the range [0, N), which can | |
be used to map users (by their user ids) to servers (by their indexes in | |
a configuration file, for example). | |
There is one important limitation of the hash, which affects its | |
use: buckets (servers) must be numbered sequentially, and it's important | |
to ensure that whenever we add or remove servers, the rest of server have to | |
maintain their positions in the configuration file. | |
To ensure we can do that, the modification of the jump hash algorithm is used. | |
The list of servers can contain gaps (the fewer the better), and if the hash | |
encounters the gap, the key is incremented, and hash is recalculated. | |
More about Jump Hash: https://arxiv.org/pdf/1406.2294.pdf | |
More about linear probing: https://en.wikipedia.org/wiki/Consistent_hashing | |
""" | |
def jump_hash(key, servers, max_idx=None, max_attempts=1000): | |
""" | |
Modification of classic Jump Hash algorithm allowing to have empty records | |
in the list of server. | |
:param key: Integer value to map (user_id) | |
:param server: servers configuration (either list with optional None values | |
in places where gaps should be, or dict) | |
:param max_idx: Maximum index of the server. Optional argument added for | |
the sake of optimization | |
:param max_attempts: A "just in case parameter" to ensure we never fall | |
into infinite loop | |
More about servers: | |
The list can look like this (mind None in place of "baz", which is added | |
to ensure we maintain the correct order of "spam" and "egg"):: | |
["foo", "bar", None, "spam", "egg"] | |
Equally, it can be a dict, like this:: | |
{1: "foo", 2: "bar", 4: "spam", 5: "egg"} | |
""" | |
# ensure that key is integer | |
if not isinstance(key, (int, long)): | |
raise ValueError('Key %s is not integer. Unable to hash' % key) | |
# convert list of servers to dict by enumerating every item | |
if isinstance(servers, (list, tuple)): | |
servers = {i[0]: i[1] for i in enumerate(servers)} | |
# ensure every key in the dict is integer value, and also exclude | |
# None values (excluding Nones is crucially important) | |
servers = {int(k): v for k, v in servers.items() if v is not None} | |
if max_idx is None: | |
max_idx = max(servers.keys()) | |
# With no gaps in the list it gives us exactly one attempt to pick the | |
# correct value | |
for i in xrange(max_attempts): | |
val = jump_once(key, max_idx + 1) | |
try: | |
return servers[val] | |
except KeyError: | |
key += 1 | |
raise ValueError("Unable to find server for %s within %s attempts" % (key, max_attempts)) | |
def jump_once(key, num_buckets): | |
"""Generate a number in the range [0, num_buckets). | |
Copied as is from | |
https://github.com/renstrom/python-jump-consistent-hash/blob/master/jump/__init__.py#L15-L38 | |
Args: | |
key (int): The key to hash. | |
num_buckets (int): Number of buckets to use. | |
Returns: | |
The bucket number `key` computes to. | |
Raises: | |
ValueError: If `num_buckets` is not a positive number. | |
""" | |
b, j = -1, 0 | |
if num_buckets < 1: | |
raise ValueError('num_buckets must be a positive number') | |
while j < num_buckets: | |
b = int(j) | |
key = ((key * long(2862933555777941757)) + 1) & 0xffffffffffffffff | |
j = float(b + 1) * (float(1 << 31) / float((key >> 33) + 1)) | |
return int(b) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment