Created
June 9, 2015 23:26
-
-
Save aristus/f0c311df98d92e367df0 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import basin | |
import string | |
from hashlib import sha1 | |
## Fixed-length compact ids for compound keys. Given one or more strings, this will | |
## concat with delim (default '|'), sha1 hash the result, convert to the given base | |
## (default 62), truncate to the given length (default 12) and left-pad with zeros. | |
## | |
## *** WARNING MATH AHEAD *** | |
## Here's a handy way to approximate the birthday number for a given bitspace and a | |
## probability of collision. Take your desired probability, say one-in-a-million | |
## prob = 0.999999 | |
## And calculate the bitspace. In this case, 12 base-62 digits | |
## space = 62 ** 12 | |
## Take the natural log of the probability, times -2, times the bitspace. The sqrt | |
## of that is approximately the number of items that can be shoved into the bitspace | |
## before there is a one-in-a-million chance of collision. | |
## int(math.sqrt((-2 * math.log(prob)) * space)) | |
## --> 80327683 | |
## With the default settings, with 80 million items, the odds of a collision are | |
## still 10**6 to 1 against. | |
## | |
## Why not base64? or base92? Or base 256, or.... | |
## One of the more annoying things about base64 and above is that they do not | |
## round-trip well through various systems like URLs, json, etc. every protocol and | |
## format has their own (sometimes two!) escaping regimes that conflict and result | |
## in sadness. base62 is a compromise between compression and resistance to corruption. | |
ALPHA = string.digits + string.ascii_letters | |
def generate(*tokens, **kwargs): | |
length = kwargs.get('length', 12) | |
base = kwargs.get('base', 62) | |
delim = kwargs.get('delim', '|') | |
return (basin.encode(ALPHA[0:base], int(sha1(delim.join(tokens)).hexdigest(), 16))[0:length]).zfill(length) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment