Skip to content

Instantly share code, notes, and snippets.

@lseppala
Created August 13, 2014 18:06
Show Gist options
  • Select an option

  • Save lseppala/66f2c4e77e33cd0c6da7 to your computer and use it in GitHub Desktop.

Select an option

Save lseppala/66f2c4e77e33cd0c6da7 to your computer and use it in GitHub Desktop.
-- You asked why the Haskell implementation is so much slower than the Ruby
-- implementation. Before we do any comparision of X language to Y langage,
-- we *must* do a complexity analysis ("Big-O"). We can't make any
-- comparison of languages if the underlying algorithm isn't the same.
--
-- Let's go through your code and analyze each function's contributing
-- complexity. I'm putting your code in order from outer-most function to
-- inner-most. You could do it vise-versa, but I think this helps
-- illustrate this example best.
-- The outermost function `findWinner` takes a list and recursively
-- operates on it by recursively removing one element each iteration until
-- there's only only one element in the list. Since it would take
-- n operations to get to the last remaining element of a list of
-- n elements, this function contributes a complexity of O(n). Simple
-- enough.
findWinner list num = if length list > 1
then findWinner (removeLooser list num) num
else head list
-- `removeLooser` performs m recurive operations on the list, where m is
-- the value of `num`. When it's called by `findWinner`, there are n·m
-- total operations, or a contributing complexity of O(n²).
removeLooser list num = if num > 1
then removeLooser (movePotato list) (num - 1)
else init (movePotato list)
-- Alright, here's where we get to the meat and potatoes (har har).
--
-- We want this function to have a complexity of O(1), because that's what
-- a queue provides: O(1) time complexity to enqueue and dequeue elements
-- on a collection. But since Haskell doesn't have a queue data structure
-- in the standard library, you wrote your own version of a queue using the
-- standard list. Not a bad idea, but we need to analyze whether it's
-- actually giving us the performance we expect from a queue.
--
-- First off, unlike a Ruby list, a Haskell list is a singly-linked
-- list[1]. That means, in order to get to the i-th element, you *must*
-- start at the beginning every time and iterate over i elements. When you
-- access the last element of a list n elements long, it takes
-- n operations. The function `last` obviously does this, so we can say
-- that `movePotato` is O(n) *at least*. But what about the functions
-- `init` and `(:)` (list prepend)? Since it's a singly linked list, it's
-- a single operation to prepend to the list, so `(:)`` is a O(1) function.
-- But `init` must iterate through all n elements of a list in order to
-- remove the last element (see the source definition if you'd like to see
-- this [2]). So we know that `movePotato` completes n·n+1 operations and
-- thus has a complexity of O(2n+1), simplified as O(n) as n → ∞.
movePotato list = (last list):(init list)
-- Therefore, the total time complexity of `findWinner` is O(n³). This is
-- *much* worse than the Ruby implementation, which has has a time
-- complexity of O(n²) since it uses a doubly-linked list as a queue with
-- O(1) access to both ends.
-------------------------------------------------------------------------------
-- But we can fix this! All we really need to do is to implement a queue
-- data structure that gives us an O(1) time complexity.
--
-- One simple way to implement this is by using two lists. This technique
-- was described in a famous paper, "Purely Functional Datastrutures" [3].
-- The first list contains everything elements that are next to be
-- dequeued, and the second list grows when elements are enqueued
--
-- Let's write the data constructor for this queue. A Queue will hold
-- elements of type a, so it looks like this:
data Queue a = Queue [a] [a] deriving Show
-- We can start by creating a function ot let us build an new, empty Queue,
-- which will have two empty lists
newQueue :: Queue a
newQueue = Queue [] []
-- Though it'll be easiest and more efficient if we can just make a queue from
-- a list, since we're using a range of integers in a list
queueFromList :: [a] -> Queue a
queueFromList xs = Queue xs []
-- This will allow us to write `queueFromList [1..1000]` or whatever range we
-- need. Fun fact: [x..y] is just sugar for `enumFromTo x y`.
-- Now we need a way to add new elements. We're going to enqueue new
-- elements by prepending them to the second list. This means that the
-- first element of the second list is the last item of the queue. It seems
-- a bit backwards, but gives us that O(1) complexity since it's just
-- a list prepend.
enqueue :: Queue a -> a -> Queue a
enqueue (Queue xs ys) y = Queue xs (y:ys)
-- Now dequeuing. We want to return both the element we dequeue and the
-- remaining queue. The element to dequeue is the first element of the
-- first list.
dequeue :: Queue a -> (Queue a, a)
dequeue (Queue (x:xs) ys) = (Queue xs ys, x)
-- But we can't dequeue if the queue is empty. For now, let's just return
-- an error. I don't like returning errors that aren't explicit, i.e., in
-- the type system. But that's a lesson for another day.
dequeue (Queue [] []) = error "Can't dequeue from empty queue"
-- But what if the first list is empty but the second has elements because
-- we've enqueue a few items? We want to start pulling elements from the
-- second list, but they're in reverse order...
dequeue (Queue [] ys) = dequeue $ Queue (reverse ys) []
-- We create a new Queue with the elements of the second list reversed and
-- put into the first position, and then recursively called dequeue on this
-- new Queue. This will match the first function definition and give us
-- what we intend: the oldest item dequeued and a new, 'refreshed' queue.
--
-- Here's a tricky question: what's the time complexity of `dequeue`? The
-- first definition just performs `head` on a list, so it's an O(1)
-- function. The second returns an error, also O(1). But the third uses
-- `reverse`, which we know is a O(n) function because it copies each
-- element to a new list[4] in reverse order.
--
-- So is `dequeue` O(n), since the worst-case definition is O(n)? Not
-- quite. When evaluating complexity, you want to consider whether it's
-- possible that an expensive operation only happens every so often. For
-- example, is it possible to hit the last defintion of `dequeue` twice
-- in a row? It's invoked if and only if the first list is empty and
-- the second is non-empty, and it returns a queue where the second list is
-- empty. Therefore, it's impossible to hit the expensive case twice. You
-- have to dequeue all of the n elements that have been reversed and then
-- enqueue some elements before you can reverse again. The expensive
-- operation is absorbed across all of the cheaper operations. This process
-- is called "amortization"[5] and are used in wide variety of data
-- structures, especially hash maps and self-balancing trees. So, we can
-- say that `dequeue` is an amortized O(1) function.
-- Finally, let's reimplement your functions and and see how the function
-- performs now! I also took the liberty to use some techniques to make
-- your code clearn. I'll annotate the technique I used and give you a link
-- where you can find it.
-- Pattern matching. [6, 7]. We have to make sure to catch both cases where
-- the last remaining element is in the first or the second list
findWinner' :: Int -> Queue a -> a
findWinner' _ (Queue [x] []) = x
findWinner' _ (Queue [] [x]) = x
findWinner' num queue = findWinner' num $ removeLoser' num queue
-- Guards. [6, 7]
removeLoser' :: Int -> Queue a -> Queue a
removeLoser' num queue
| num == 0 = fst $ dequeue queue
| num > 0 = removeLoser' (num - 1) $ movePotato' queue
| otherwise = error "Can't remove the i-th item when i is less than 1"
--`fst` gets the first part of the tuple returned by `dequeue`
-- And again, I hate non-explicit errors
-- Alright, I'm going to do something fancy here and I'll try to explain.
-- Don't fret if it takes some mental chewing.
movePotato' :: Queue a -> Queue a
movePotato' = uncurry enqueue . dequeue
-- `uncurry` takes a function a -> b -> c and changes it to (a, b) -> c
-- `.` is the composition function. It takes two functions as arguments,
-- b -> c and a -> b and returns a function a -> c. `g . f` is the same as
-- applying function `f ` first and then applying `g` to that result.
--
-- `uncurry enqueue` returns a function (Queue a, a) -> Queue a, and
-- dequeue returns Queue a -> (Queue a, a). Therefore, we can compose them
-- to create a function of type Queue a -> Queue a, which removes the
-- first item in the queue and puts it on the end of the queue.
--
-- Haskell is awesome!
--
--
-- Alright, and for the final test...
--
-- Enter ':set +s' in GHCI to get timings of functions. It's not super
-- accurate, but it's a good measure. Load this file and run the following
-- two funcitons to get timings. The timings I got are in comments
oldFindWinnerTest = findWinner [1..10000] 7
-- 5687
-- (39.70 secs, 22442495720 bytes)
newFindWinnerTest = findWinner' 7 $ queueFromList [1..10000]
-- 4314
-- (0.14 secs, 51468136 bytes)
-- and if you want to get really serious...
hellYeahHaskell = findWinner' 7 $ queueFromList [1..100000]
-- Fuck yeah. 1.56 secs. Ruby can't touch that shit.
-- Timing things usually have a large margin of error, and your milage
-- will vary (I was running Google hangouts while running this, actually.
-- My CPU was pretty busy). But it's pretty freakin' clear that the new
-- method works *a lot* better.
-- Finally, I hate to break it to you, but your original implementation has
-- two bugs. First, your `movePotato` method goes backwards in that it
-- removes from the *end* of the queue and puts it on the *beginning*,
-- violating a FIFO (first-in, first-out). You'll notice that we'll get
-- different answers with our functions. You may have chosen to do that
-- because that was the simplest way you could figure out how to do
-- a queue. The way to fix it is pretty simple:
fixedMovePotato list = tail list ++ [head list]
-- This is still a O(n) function, though it happens to be not as bad as
-- your original, which takes a double hit for using both `last` and
-- `init`.
--
-- The second bug is an off-by one error. You should count down to 0 rather
-- than 1 in 'removeLooser'. Using the test example from the class with
-- 6 people and num = 5 shows you don't hit the expected answer
-- I hope that was helpful! I spent a lot more time on this than
-- I intended, but it was instructive for me too :) Feel free to ask me any
-- other questions you have about haskell
--[1]: http://en.wikipedia.org/wiki/Singly_linked_list#Singly_linked_lists
--[2]: http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-List.html#init
--[3]: http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
--[4]: http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-List.html#reverse
--[5]: http://en.wikipedia.org/wiki/Amortized_analysis
--[6]: http://learnyouahaskell.com/syntax-in-functions
--[7]: https://www.haskell.org/tutorial/patterns.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment