Skip to content

Instantly share code, notes, and snippets.

@zackmdavis
Created February 27, 2014 02:39
Show Gist options
  • Save zackmdavis/9243275 to your computer and use it in GitHub Desktop.
Save zackmdavis/9243275 to your computer and use it in GitHub Desktop.
Before we take a closer look at consistent hashing in Swift, let’s
examine a simpler way in which one might determine where to store
objects. This will illustrate some of the fundamental problems that
need to be solved in order to efficiently add or remove capacity to a
cluster.
A _hash function_ takes in some data and outputs a number that
seemingly has nothing to do with the input but is actually completely
determined by it. This is useful for our purposes, because we want a
predictable way to assign objects to devices, but we don't want our
method to be sensitive to any patterns in what kinds of objects we
might want to store: if decided to place objects based on, say, their
alphabetical order by name, we might not get an even distribution of
data. Suppose we were to determine where to place an object by hashing
its fully qualified name and then taking the remainder after dividing
by the number of drives available for storage. Each possible remainder
would map to a particular drive. For example, suppose we want to store
an object with the path “/account/container/object” and that we have 4
drives available for storage.
The drives:
|===========================
|Drive ID|Drive Name |
|0 |Drive A |
|1 |Drive B |
|2 |Drive C |
|3 |Drive D |
|===========================
We start by using the MD5 function to the hash the fully-qualified name:
....
$ md5 -s /account/container/object
MD5 ("/account/container/object") = f9db0f833f1545be2e40f387d6c271de
....
The hexidecimal (base-16) _digest_ "f9db0f833f1545be2e40f387d6c271de"
corresponds to the number 20757199912313737259989936919412483869 in
the more-familiar decimal (base 10) notation. The reminder when
dividing this number by 4 is 1, so the object would be stored on Drive
B.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment