Last active
August 29, 2015 14:05
-
-
Save r-barnes/a450782655edb8271c5d to your computer and use it in GitHub Desktop.
Richard's Unambiguous URL Shortener
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/ruby | |
require 'digest' | |
#This function takes an input string, a desired hash length, and an attempt | |
#number. It returns a hash of the specified length encoded in an alphabet which | |
#is unambiguous in both upper and lower case. The attempt number should | |
#initially be 0. If the hash is found to conflict with a previous result, then | |
#the attempt number should be incremented and the function called again with the | |
#same input string and hashlen. If you have incremened attempt number an | |
#unreasonably large number of times (you must figure out what this means), then | |
#you should increase your hash length. | |
#Author: Richard Barnes ([email protected]) | |
def ShortenURL(inputstr, hashlen, attempt_number) | |
#This alphabet has been chosen to be unambiguous in both upper- and lower- | |
#case across a variety of fonts. To do this, the following characters have | |
#been removed: L l I i 0 O o | |
alphabet = "ABCDEFGHJKMNPQRSTUVWXYZ23456789" | |
#Convert input string into a hexadecimal hash. Repeat as necessary to generate | |
#a unique short url. The idea here is that a good hash function distributes | |
#its inputs uniformly and unpredictably across its hashspace. If this is true, | |
#then the least-significant digits of the output hash are also distributed | |
#uniformly and unpredictably across the smaller hash space they represent. | |
for i in 0..attempt_number | |
inputstr = Digest::MD5.hexdigest( inputstr ) | |
end | |
#Convert hexadecimal hash into a (very long) integer | |
inputstr = inputstr.hex | |
#Use modulus to wrap very long integer into the hashspace addressable by the | |
#given alphabet and hash length | |
inputstr = inputstr % (alphabet.length**hashlen) | |
#Convert the resuling number into an array of integers representing the number | |
#in base(hashlen) | |
s = [] | |
while true do | |
inputstr, r = inputstr.divmod(alphabet.length) | |
s<<r | |
if inputstr == 0 | |
break | |
end | |
end | |
#Reverse s so that the digits are in the appropriate order | |
s.reverse! | |
#Convert numeric digits into the characters specified in alphabet | |
output = "" | |
for i in s | |
output += alphabet[i] | |
end | |
#Ensure that output is equal to hashlen, in case we happen to spit out a small | |
#number | |
output = output.rjust(hashlen, alphabet[0]) | |
#Remember to give back to the world | |
return output | |
end | |
#This function demonstrates how your code should call the URL shortener | |
#function. You will have to adapt this function to your particular use-case. | |
def GenHash(string, database_of_hashes) | |
#Check if the URL has already been hashed. If so, return that hash without any | |
#further look-ups. | |
if database_of_hashes.include? string | |
return database_of_hashes[string] | |
end | |
#If the URL has not already been hashed, try to hash it. Try again if the hash | |
#has already been used for a different URL (if there is a hash collision). | |
#Stop trying after 10 attempts to avoid and infinite loop in a crowded hash | |
#space. | |
i = 0 | |
begin | |
hash = ShortenURL(string, 3, i) | |
i += 1 | |
end while database_of_hashes.has_value?(hash) and i<10 | |
if i==10 #Too many attempts: give up | |
return false | |
else #Success. Make a note that this URL hashes to this hash | |
database_of_hashes[string] = hash | |
return hash | |
end | |
end | |
#This finds conflicts for use in examples | |
def FindConflicts() | |
a = Hash.new | |
for i in 0..5000000 | |
hash=ShortenURL(i.to_s(26), 3, 0) | |
if a.include? hash | |
puts "Conflict" | |
puts i.to_s(26) | |
puts a[hash] | |
end | |
a[hash]=i.to_s(26) | |
end | |
end | |
#This runs some example cases to help you understand how to use the code | |
def RunExample() | |
database_of_hashes = Hash.new | |
puts "For 'hello', the example GenHash function gives " + GenHash("hello", database_of_hashes) | |
puts "Using 'hello' a second time, the example GenHash function still gives " + GenHash("hello", database_of_hashes) | |
puts "Notice that the same hash is generated each time!" | |
puts " " | |
puts "Using the ShortenURL function for 1l0l gives " + ShortenURL("1l0l", 3, 0) | |
puts "Using the ShortenURL function for agj gives " + ShortenURL("agj", 3, 0) | |
puts "Notice how these inputs result in a hash collision!" | |
puts " " | |
puts "Therefore, when you call ShortenURL, you must handle collisions." | |
puts "The example GenHash function handles this." | |
puts "Passing '1l0l' to GenHash gives " + GenHash("1l0l", database_of_hashes) | |
puts "passing 'agj' to GenHash gives " + GenHash("agj", database_of_hashes) | |
end | |
RunExample() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment