Created
August 13, 2014 18:06
-
-
Save lseppala/66f2c4e77e33cd0c6da7 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| -- You asked why the Haskell implementation is so much slower than the Ruby | |
| -- implementation. Before we do any comparision of X language to Y langage, | |
| -- we *must* do a complexity analysis ("Big-O"). We can't make any | |
| -- comparison of languages if the underlying algorithm isn't the same. | |
| -- | |
| -- Let's go through your code and analyze each function's contributing | |
| -- complexity. I'm putting your code in order from outer-most function to | |
| -- inner-most. You could do it vise-versa, but I think this helps | |
| -- illustrate this example best. | |
| -- The outermost function `findWinner` takes a list and recursively | |
| -- operates on it by recursively removing one element each iteration until | |
| -- there's only only one element in the list. Since it would take | |
| -- n operations to get to the last remaining element of a list of | |
| -- n elements, this function contributes a complexity of O(n). Simple | |
| -- enough. | |
| findWinner list num = if length list > 1 | |
| then findWinner (removeLooser list num) num | |
| else head list | |
| -- `removeLooser` performs m recurive operations on the list, where m is | |
| -- the value of `num`. When it's called by `findWinner`, there are n·m | |
| -- total operations, or a contributing complexity of O(n²). | |
| removeLooser list num = if num > 1 | |
| then removeLooser (movePotato list) (num - 1) | |
| else init (movePotato list) | |
| -- Alright, here's where we get to the meat and potatoes (har har). | |
| -- | |
| -- We want this function to have a complexity of O(1), because that's what | |
| -- a queue provides: O(1) time complexity to enqueue and dequeue elements | |
| -- on a collection. But since Haskell doesn't have a queue data structure | |
| -- in the standard library, you wrote your own version of a queue using the | |
| -- standard list. Not a bad idea, but we need to analyze whether it's | |
| -- actually giving us the performance we expect from a queue. | |
| -- | |
| -- First off, unlike a Ruby list, a Haskell list is a singly-linked | |
| -- list[1]. That means, in order to get to the i-th element, you *must* | |
| -- start at the beginning every time and iterate over i elements. When you | |
| -- access the last element of a list n elements long, it takes | |
| -- n operations. The function `last` obviously does this, so we can say | |
| -- that `movePotato` is O(n) *at least*. But what about the functions | |
| -- `init` and `(:)` (list prepend)? Since it's a singly linked list, it's | |
| -- a single operation to prepend to the list, so `(:)`` is a O(1) function. | |
| -- But `init` must iterate through all n elements of a list in order to | |
| -- remove the last element (see the source definition if you'd like to see | |
| -- this [2]). So we know that `movePotato` completes n·n+1 operations and | |
| -- thus has a complexity of O(2n+1), simplified as O(n) as n → ∞. | |
| movePotato list = (last list):(init list) | |
| -- Therefore, the total time complexity of `findWinner` is O(n³). This is | |
| -- *much* worse than the Ruby implementation, which has has a time | |
| -- complexity of O(n²) since it uses a doubly-linked list as a queue with | |
| -- O(1) access to both ends. | |
| ------------------------------------------------------------------------------- | |
| -- But we can fix this! All we really need to do is to implement a queue | |
| -- data structure that gives us an O(1) time complexity. | |
| -- | |
| -- One simple way to implement this is by using two lists. This technique | |
| -- was described in a famous paper, "Purely Functional Datastrutures" [3]. | |
| -- The first list contains everything elements that are next to be | |
| -- dequeued, and the second list grows when elements are enqueued | |
| -- | |
| -- Let's write the data constructor for this queue. A Queue will hold | |
| -- elements of type a, so it looks like this: | |
| data Queue a = Queue [a] [a] deriving Show | |
| -- We can start by creating a function ot let us build an new, empty Queue, | |
| -- which will have two empty lists | |
| newQueue :: Queue a | |
| newQueue = Queue [] [] | |
| -- Though it'll be easiest and more efficient if we can just make a queue from | |
| -- a list, since we're using a range of integers in a list | |
| queueFromList :: [a] -> Queue a | |
| queueFromList xs = Queue xs [] | |
| -- This will allow us to write `queueFromList [1..1000]` or whatever range we | |
| -- need. Fun fact: [x..y] is just sugar for `enumFromTo x y`. | |
| -- Now we need a way to add new elements. We're going to enqueue new | |
| -- elements by prepending them to the second list. This means that the | |
| -- first element of the second list is the last item of the queue. It seems | |
| -- a bit backwards, but gives us that O(1) complexity since it's just | |
| -- a list prepend. | |
| enqueue :: Queue a -> a -> Queue a | |
| enqueue (Queue xs ys) y = Queue xs (y:ys) | |
| -- Now dequeuing. We want to return both the element we dequeue and the | |
| -- remaining queue. The element to dequeue is the first element of the | |
| -- first list. | |
| dequeue :: Queue a -> (Queue a, a) | |
| dequeue (Queue (x:xs) ys) = (Queue xs ys, x) | |
| -- But we can't dequeue if the queue is empty. For now, let's just return | |
| -- an error. I don't like returning errors that aren't explicit, i.e., in | |
| -- the type system. But that's a lesson for another day. | |
| dequeue (Queue [] []) = error "Can't dequeue from empty queue" | |
| -- But what if the first list is empty but the second has elements because | |
| -- we've enqueue a few items? We want to start pulling elements from the | |
| -- second list, but they're in reverse order... | |
| dequeue (Queue [] ys) = dequeue $ Queue (reverse ys) [] | |
| -- We create a new Queue with the elements of the second list reversed and | |
| -- put into the first position, and then recursively called dequeue on this | |
| -- new Queue. This will match the first function definition and give us | |
| -- what we intend: the oldest item dequeued and a new, 'refreshed' queue. | |
| -- | |
| -- Here's a tricky question: what's the time complexity of `dequeue`? The | |
| -- first definition just performs `head` on a list, so it's an O(1) | |
| -- function. The second returns an error, also O(1). But the third uses | |
| -- `reverse`, which we know is a O(n) function because it copies each | |
| -- element to a new list[4] in reverse order. | |
| -- | |
| -- So is `dequeue` O(n), since the worst-case definition is O(n)? Not | |
| -- quite. When evaluating complexity, you want to consider whether it's | |
| -- possible that an expensive operation only happens every so often. For | |
| -- example, is it possible to hit the last defintion of `dequeue` twice | |
| -- in a row? It's invoked if and only if the first list is empty and | |
| -- the second is non-empty, and it returns a queue where the second list is | |
| -- empty. Therefore, it's impossible to hit the expensive case twice. You | |
| -- have to dequeue all of the n elements that have been reversed and then | |
| -- enqueue some elements before you can reverse again. The expensive | |
| -- operation is absorbed across all of the cheaper operations. This process | |
| -- is called "amortization"[5] and are used in wide variety of data | |
| -- structures, especially hash maps and self-balancing trees. So, we can | |
| -- say that `dequeue` is an amortized O(1) function. | |
| -- Finally, let's reimplement your functions and and see how the function | |
| -- performs now! I also took the liberty to use some techniques to make | |
| -- your code clearn. I'll annotate the technique I used and give you a link | |
| -- where you can find it. | |
| -- Pattern matching. [6, 7]. We have to make sure to catch both cases where | |
| -- the last remaining element is in the first or the second list | |
| findWinner' :: Int -> Queue a -> a | |
| findWinner' _ (Queue [x] []) = x | |
| findWinner' _ (Queue [] [x]) = x | |
| findWinner' num queue = findWinner' num $ removeLoser' num queue | |
| -- Guards. [6, 7] | |
| removeLoser' :: Int -> Queue a -> Queue a | |
| removeLoser' num queue | |
| | num == 0 = fst $ dequeue queue | |
| | num > 0 = removeLoser' (num - 1) $ movePotato' queue | |
| | otherwise = error "Can't remove the i-th item when i is less than 1" | |
| --`fst` gets the first part of the tuple returned by `dequeue` | |
| -- And again, I hate non-explicit errors | |
| -- Alright, I'm going to do something fancy here and I'll try to explain. | |
| -- Don't fret if it takes some mental chewing. | |
| movePotato' :: Queue a -> Queue a | |
| movePotato' = uncurry enqueue . dequeue | |
| -- `uncurry` takes a function a -> b -> c and changes it to (a, b) -> c | |
| -- `.` is the composition function. It takes two functions as arguments, | |
| -- b -> c and a -> b and returns a function a -> c. `g . f` is the same as | |
| -- applying function `f ` first and then applying `g` to that result. | |
| -- | |
| -- `uncurry enqueue` returns a function (Queue a, a) -> Queue a, and | |
| -- dequeue returns Queue a -> (Queue a, a). Therefore, we can compose them | |
| -- to create a function of type Queue a -> Queue a, which removes the | |
| -- first item in the queue and puts it on the end of the queue. | |
| -- | |
| -- Haskell is awesome! | |
| -- | |
| -- | |
| -- Alright, and for the final test... | |
| -- | |
| -- Enter ':set +s' in GHCI to get timings of functions. It's not super | |
| -- accurate, but it's a good measure. Load this file and run the following | |
| -- two funcitons to get timings. The timings I got are in comments | |
| oldFindWinnerTest = findWinner [1..10000] 7 | |
| -- 5687 | |
| -- (39.70 secs, 22442495720 bytes) | |
| newFindWinnerTest = findWinner' 7 $ queueFromList [1..10000] | |
| -- 4314 | |
| -- (0.14 secs, 51468136 bytes) | |
| -- and if you want to get really serious... | |
| hellYeahHaskell = findWinner' 7 $ queueFromList [1..100000] | |
| -- Fuck yeah. 1.56 secs. Ruby can't touch that shit. | |
| -- Timing things usually have a large margin of error, and your milage | |
| -- will vary (I was running Google hangouts while running this, actually. | |
| -- My CPU was pretty busy). But it's pretty freakin' clear that the new | |
| -- method works *a lot* better. | |
| -- Finally, I hate to break it to you, but your original implementation has | |
| -- two bugs. First, your `movePotato` method goes backwards in that it | |
| -- removes from the *end* of the queue and puts it on the *beginning*, | |
| -- violating a FIFO (first-in, first-out). You'll notice that we'll get | |
| -- different answers with our functions. You may have chosen to do that | |
| -- because that was the simplest way you could figure out how to do | |
| -- a queue. The way to fix it is pretty simple: | |
| fixedMovePotato list = tail list ++ [head list] | |
| -- This is still a O(n) function, though it happens to be not as bad as | |
| -- your original, which takes a double hit for using both `last` and | |
| -- `init`. | |
| -- | |
| -- The second bug is an off-by one error. You should count down to 0 rather | |
| -- than 1 in 'removeLooser'. Using the test example from the class with | |
| -- 6 people and num = 5 shows you don't hit the expected answer | |
| -- I hope that was helpful! I spent a lot more time on this than | |
| -- I intended, but it was instructive for me too :) Feel free to ask me any | |
| -- other questions you have about haskell | |
| --[1]: http://en.wikipedia.org/wiki/Singly_linked_list#Singly_linked_lists | |
| --[2]: http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-List.html#init | |
| --[3]: http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf | |
| --[4]: http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-List.html#reverse | |
| --[5]: http://en.wikipedia.org/wiki/Amortized_analysis | |
| --[6]: http://learnyouahaskell.com/syntax-in-functions | |
| --[7]: https://www.haskell.org/tutorial/patterns.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment