Skip to content

Instantly share code, notes, and snippets.

@imankulov
Created May 30, 2016 16:23
Show Gist options
  • Save imankulov/064b89b3e36cf1cdc29b9b5b7bb65f08 to your computer and use it in GitHub Desktop.
Save imankulov/064b89b3e36cf1cdc29b9b5b7bb65f08 to your computer and use it in GitHub Desktop.
Jump hash with linear probing
"""
Jump hash with linear probing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
JumpHash(N) maps consistently integer values onto the range [0, N), which can
be used to map users (by their user ids) to servers (by their indexes in
a configuration file, for example).
There is one important limitation of the hash, which affects its
use: buckets (servers) must be numbered sequentially, and it's important
to ensure that whenever we add or remove servers, the rest of server have to
maintain their positions in the configuration file.
To ensure we can do that, the modification of the jump hash algorithm is used.
The list of servers can contain gaps (the fewer the better), and if the hash
encounters the gap, the key is incremented, and hash is recalculated.
More about Jump Hash: https://arxiv.org/pdf/1406.2294.pdf
More about linear probing: https://en.wikipedia.org/wiki/Consistent_hashing
"""
def jump_hash(key, servers, max_idx=None, max_attempts=1000):
"""
Modification of classic Jump Hash algorithm allowing to have empty records
in the list of server.
:param key: Integer value to map (user_id)
:param server: servers configuration (either list with optional None values
in places where gaps should be, or dict)
:param max_idx: Maximum index of the server. Optional argument added for
the sake of optimization
:param max_attempts: A "just in case parameter" to ensure we never fall
into infinite loop
More about servers:
The list can look like this (mind None in place of "baz", which is added
to ensure we maintain the correct order of "spam" and "egg")::
["foo", "bar", None, "spam", "egg"]
Equally, it can be a dict, like this::
{1: "foo", 2: "bar", 4: "spam", 5: "egg"}
"""
# ensure that key is integer
if not isinstance(key, (int, long)):
raise ValueError('Key %s is not integer. Unable to hash' % key)
# convert list of servers to dict by enumerating every item
if isinstance(servers, (list, tuple)):
servers = {i[0]: i[1] for i in enumerate(servers)}
# ensure every key in the dict is integer value, and also exclude
# None values (excluding Nones is crucially important)
servers = {int(k): v for k, v in servers.items() if v is not None}
if max_idx is None:
max_idx = max(servers.keys())
# With no gaps in the list it gives us exactly one attempt to pick the
# correct value
for i in xrange(max_attempts):
val = jump_once(key, max_idx + 1)
try:
return servers[val]
except KeyError:
key += 1
raise ValueError("Unable to find server for %s within %s attempts" % (key, max_attempts))
def jump_once(key, num_buckets):
"""Generate a number in the range [0, num_buckets).
Copied as is from
https://github.com/renstrom/python-jump-consistent-hash/blob/master/jump/__init__.py#L15-L38
Args:
key (int): The key to hash.
num_buckets (int): Number of buckets to use.
Returns:
The bucket number `key` computes to.
Raises:
ValueError: If `num_buckets` is not a positive number.
"""
b, j = -1, 0
if num_buckets < 1:
raise ValueError('num_buckets must be a positive number')
while j < num_buckets:
b = int(j)
key = ((key * long(2862933555777941757)) + 1) & 0xffffffffffffffff
j = float(b + 1) * (float(1 << 31) / float((key >> 33) + 1))
return int(b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment