lseppala · August 13, 2014 18:06
diff --git a/potato.hs b/potato.hs
 -- You asked why the Haskell implementation is so much slower than the Ruby
 -- implementation. Before we do any comparision of X language to Y langage,
 -- we *must* do a complexity analysis ("Big-O"). We can't make any
 -- comparison of languages if the underlying algorithm isn't the same.
 --
 -- Let's go through your code and analyze each function's contributing
 -- complexity. I'm putting your code in order from outer-most function to
 -- inner-most. You could do it vise-versa, but I think this helps
 -- illustrate this example best.

 -- The outermost function `findWinner` takes a list and recursively
 -- operates on it by recursively removing one element each iteration until
 -- there's only only one element in the list. Since it would take
 -- n operations to get to the last remaining element of a list of
 -- n elements, this function contributes a complexity of O(n). Simple
 -- enough.
 findWinner list num = if length list > 1
                        then findWinner (removeLooser list num) num
                        else head list

 -- `removeLooser` performs m recurive operations on the list, where m is
 -- the value of `num`. When it's called by `findWinner`, there are n·m
 -- total operations, or a contributing complexity of O(n²).
 removeLooser list num = if num > 1
                          then removeLooser (movePotato list) (num - 1)
                          else init (movePotato list)

 -- Alright, here's where we get to the meat and potatoes (har har).
 --
 -- We want this function to have a complexity of O(1), because that's what
 -- a queue provides: O(1) time complexity to enqueue and dequeue elements
 -- on a collection. But since Haskell doesn't have a queue data structure
 -- in the standard library, you wrote your own version of a queue using the
 -- standard list. Not a bad idea, but we need to analyze whether it's
 -- actually giving us the performance we expect from a queue.
 --
 -- First off, unlike a Ruby list, a Haskell list is a singly-linked
 -- list[1]. That means, in order to get to the i-th element, you *must*
 -- start at the beginning every time and iterate over i elements. When you
 -- access the last element of a list n elements long, it takes
 -- n operations. The function `last` obviously does this, so we can say
 -- that `movePotato` is O(n) *at least*. But what about the functions
 -- `init` and `(:)` (list prepend)? Since it's a singly linked list, it's
 -- a single operation to prepend to the list, so `(:)`` is a O(1) function.
 -- But `init` must iterate through all n elements of a list in order to
 -- remove the last element (see the source definition if you'd like to see
 -- this [2]). So we know that `movePotato` completes n·n+1 operations and
 -- thus has a complexity of O(2n+1), simplified as O(n) as n → ∞.

 movePotato list = (last list):(init list)

 -- Therefore, the total time complexity of `findWinner` is O(n³). This is
 -- *much* worse than the Ruby implementation, which has has a time
 -- complexity of O(n²) since it uses a doubly-linked list as a queue with
 -- O(1) access to both ends.

 -------------------------------------------------------------------------------

 -- But we can fix this! All we really need to do is to implement a queue
 -- data structure that gives us an O(1) time complexity.
 --
 -- One simple way to implement this is by using two lists. This technique
 -- was described in a famous paper, "Purely Functional Datastrutures" [3].
 -- The first list contains everything elements that are next to be
 -- dequeued, and the second list grows when elements are enqueued
 --
 -- Let's write the data constructor for this queue. A Queue will hold
 -- elements of type a, so it looks like this:
 data Queue a = Queue [a] [a] deriving Show

 -- We can start by creating a function ot let us build an new, empty Queue,
 -- which will have two empty lists
 newQueue :: Queue a
 newQueue = Queue [] []

 -- Though it'll be easiest and more efficient if we can just make a queue from
 -- a list, since we're using a range of integers in a list
 queueFromList :: [a] -> Queue a
 queueFromList xs = Queue xs []

 -- This will allow us to write `queueFromList [1..1000]` or whatever range we
 -- need. Fun fact: [x..y] is just sugar for `enumFromTo x y`.

 -- Now we need a way to add new elements. We're going to enqueue new
 -- elements by prepending them to the second list. This means that the
 -- first element of the second list is the last item of the queue. It seems
 -- a bit backwards, but gives us that O(1) complexity since it's just
 -- a list prepend.
 enqueue :: Queue a -> a -> Queue a
 enqueue (Queue xs ys) y = Queue xs (y:ys)


 -- Now dequeuing. We want to return both the element we dequeue and the
 -- remaining queue. The element to dequeue is the first element of the
 -- first list.
 dequeue :: Queue a -> (Queue a, a)
 dequeue (Queue (x:xs) ys) = (Queue xs ys, x)

 -- But we can't dequeue if the queue is empty. For now, let's just return
 -- an error. I don't like returning errors that aren't explicit, i.e., in
 -- the type system. But that's a lesson for another day.
 dequeue (Queue [] []) = error "Can't dequeue from empty queue"

 -- But what if the first list is empty but the second has elements because
 -- we've enqueue a few items? We want to start pulling elements from the
 -- second list, but they're in reverse order...
 dequeue (Queue [] ys) = dequeue $ Queue (reverse ys) []

 -- We create a new Queue with the elements of the second list reversed and
 -- put into the first position, and then recursively called dequeue on this
 -- new Queue. This will match the first function definition and give us
 -- what we intend: the oldest item dequeued and a new, 'refreshed' queue.
 --
 -- Here's a tricky question: what's the time complexity of `dequeue`? The
 -- first definition just performs `head` on a list, so it's an O(1)
 -- function. The second returns an error, also O(1). But the third uses
 -- `reverse`, which we know is a O(n) function because it copies each
 -- element to a new list[4] in reverse order.
 --
 -- So is `dequeue` O(n), since the worst-case definition is O(n)? Not
 -- quite. When evaluating complexity, you want to consider whether it's
 -- possible that an expensive operation only happens every so often. For
 -- example, is it possible to hit the last defintion of `dequeue` twice
 -- in a row? It's invoked if and only if the first list is empty and
 -- the second is non-empty, and it returns a queue where the second list is
 -- empty. Therefore, it's impossible to hit the expensive case twice. You
 -- have to dequeue all of the n elements that have been reversed and then
 -- enqueue some elements before you can reverse again. The expensive
 -- operation is absorbed across all of the cheaper operations. This process
 -- is called "amortization"[5] and are used in wide variety of data
 -- structures, especially hash maps and self-balancing trees. So, we can
 -- say that `dequeue` is an amortized O(1) function.


 -- Finally, let's reimplement your functions and and see how the function
 -- performs now! I also took the liberty to use some techniques to make
 -- your code clearn. I'll annotate the technique I used and give you a link
 -- where you can find it.

 -- Pattern matching. [6, 7]. We have to make sure to catch both cases where
 -- the last remaining element is in the first or the second list
 findWinner' :: Int -> Queue a -> a
 findWinner' _ (Queue [x] []) = x
 findWinner' _ (Queue [] [x]) = x
 findWinner' num queue = findWinner' num $ removeLoser' num queue

 -- Guards. [6, 7]
 removeLoser' :: Int -> Queue a -> Queue a
 removeLoser' num queue 
    | num == 0 = fst $ dequeue queue
    | num > 0 = removeLoser' (num - 1) $ movePotato' queue
    | otherwise = error "Can't remove the i-th item when i is less than 1"

 --`fst` gets the first part of the tuple returned by `dequeue`
 -- And again, I hate non-explicit errors

 -- Alright, I'm going to do something fancy here and I'll try to explain.
 -- Don't fret if it takes some mental chewing.
 movePotato' :: Queue a -> Queue a
 movePotato' = uncurry enqueue . dequeue

 -- `uncurry` takes a function a -> b -> c and changes it to (a, b) -> c
 -- `.` is the composition function. It takes two functions as arguments,
 --  b -> c and a -> b and returns a function a -> c. `g . f` is the same as
 --  applying function `f ` first and then applying `g` to that result.
 --
 --  `uncurry enqueue` returns a function (Queue a, a) -> Queue a, and
 --  dequeue returns Queue a -> (Queue a, a). Therefore, we can compose them
 --  to create a function of type Queue a -> Queue a, which removes the
 --  first item in the queue and puts it on the end of the queue.
 --
 --  Haskell is awesome!
 --
 --
 --  Alright, and for the final test...
 --
 --  Enter ':set +s' in GHCI to get timings of functions. It's not super
 --  accurate, but it's a good measure. Load this file and run the following
 --  two funcitons to get timings. The timings I got are in comments

 oldFindWinnerTest = findWinner [1..10000] 7

 -- 5687
 -- (39.70 secs, 22442495720 bytes)

 newFindWinnerTest = findWinner' 7 $ queueFromList [1..10000]

 -- 4314
 -- (0.14 secs, 51468136 bytes)

 -- and if you want to get really serious...
 hellYeahHaskell = findWinner' 7 $ queueFromList [1..100000]
 -- Fuck yeah. 1.56 secs. Ruby can't touch that shit.

 -- Timing things usually have a large margin of error, and your milage
 -- will vary (I was running Google hangouts while running this, actually.
 -- My CPU was pretty busy). But it's pretty freakin' clear that the new
 -- method works *a lot* better.


 -- Finally, I hate to break it to you, but your original implementation has
 -- two bugs. First, your `movePotato` method goes backwards in that it
 -- removes from the *end* of the queue and puts it on the *beginning*,
 -- violating a FIFO (first-in, first-out). You'll notice that we'll get
 -- different answers with our functions. You may have chosen to do that
 -- because that was the simplest way you could figure out how to do
 -- a queue. The way to fix it is pretty simple:

 fixedMovePotato list = tail list ++ [head list]

 -- This is still a O(n) function, though it happens to be not as bad as
 -- your original, which takes a double hit for using both `last` and
 -- `init`.
 --
 -- The second bug is an off-by one error. You should count down to 0 rather
 -- than 1 in 'removeLooser'. Using the test example from the class with
 -- 6 people and num = 5 shows you don't hit the expected answer


 -- I hope that was helpful! I spent a lot more time on this than
 -- I intended, but it was instructive for me too :) Feel free to ask me any
 -- other questions you have about haskell


 --[1]: http://en.wikipedia.org/wiki/Singly_linked_list#Singly_linked_lists
 --[2]: http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-List.html#init
 --[3]: http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
 --[4]: http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-List.html#reverse
 --[5]: http://en.wikipedia.org/wiki/Amortized_analysis
 --[6]: http://learnyouahaskell.com/syntax-in-functions
 --[7]: https://www.haskell.org/tutorial/patterns.html
	-- You asked why the Haskell implementation is so much slower than the Ruby
	-- implementation. Before we do any comparision of X language to Y langage,
	-- we must do a complexity analysis ("Big-O"). We can't make any
	-- comparison of languages if the underlying algorithm isn't the same.
	--
	-- Let's go through your code and analyze each function's contributing
	-- complexity. I'm putting your code in order from outer-most function to
	-- inner-most. You could do it vise-versa, but I think this helps
	-- illustrate this example best.

	-- The outermost function `findWinner` takes a list and recursively
	-- operates on it by recursively removing one element each iteration until
	-- there's only only one element in the list. Since it would take
	-- n operations to get to the last remaining element of a list of
	-- n elements, this function contributes a complexity of O(n). Simple
	-- enough.
	findWinner list num = if length list > 1
	then findWinner (removeLooser list num) num
	else head list

	-- `removeLooser` performs m recurive operations on the list, where m is
	-- the value of `num`. When it's called by `findWinner`, there are n·m
	-- total operations, or a contributing complexity of O(n²).
	removeLooser list num = if num > 1
	then removeLooser (movePotato list) (num - 1)
	else init (movePotato list)

	-- Alright, here's where we get to the meat and potatoes (har har).
	--
	-- We want this function to have a complexity of O(1), because that's what
	-- a queue provides: O(1) time complexity to enqueue and dequeue elements
	-- on a collection. But since Haskell doesn't have a queue data structure
	-- in the standard library, you wrote your own version of a queue using the
	-- standard list. Not a bad idea, but we need to analyze whether it's
	-- actually giving us the performance we expect from a queue.
	--
	-- First off, unlike a Ruby list, a Haskell list is a singly-linked
	-- list[1]. That means, in order to get to the i-th element, you must
	-- start at the beginning every time and iterate over i elements. When you
	-- access the last element of a list n elements long, it takes
	-- n operations. The function `last` obviously does this, so we can say
	-- that `movePotato` is O(n) at least. But what about the functions
	-- `init` and `(:)` (list prepend)? Since it's a singly linked list, it's
	-- a single operation to prepend to the list, so `(:)`` is a O(1) function.
	-- But `init` must iterate through all n elements of a list in order to
	-- remove the last element (see the source definition if you'd like to see
	-- this [2]). So we know that `movePotato` completes n·n+1 operations and
	-- thus has a complexity of O(2n+1), simplified as O(n) as n → ∞.

	movePotato list = (last list):(init list)

	-- Therefore, the total time complexity of `findWinner` is O(n³). This is
	-- much worse than the Ruby implementation, which has has a time
	-- complexity of O(n²) since it uses a doubly-linked list as a queue with
	-- O(1) access to both ends.

	-------------------------------------------------------------------------------

	-- But we can fix this! All we really need to do is to implement a queue
	-- data structure that gives us an O(1) time complexity.
	--
	-- One simple way to implement this is by using two lists. This technique
	-- was described in a famous paper, "Purely Functional Datastrutures" [3].
	-- The first list contains everything elements that are next to be
	-- dequeued, and the second list grows when elements are enqueued
	--
	-- Let's write the data constructor for this queue. A Queue will hold
	-- elements of type a, so it looks like this:
	data Queue a = Queue [a] [a] deriving Show

	-- We can start by creating a function ot let us build an new, empty Queue,
	-- which will have two empty lists
	newQueue :: Queue a
	newQueue = Queue [] []

	-- Though it'll be easiest and more efficient if we can just make a queue from
	-- a list, since we're using a range of integers in a list
	queueFromList :: [a] -> Queue a
	queueFromList xs = Queue xs []

	-- This will allow us to write `queueFromList [1..1000]` or whatever range we
	-- need. Fun fact: [x..y] is just sugar for `enumFromTo x y`.

	-- Now we need a way to add new elements. We're going to enqueue new
	-- elements by prepending them to the second list. This means that the
	-- first element of the second list is the last item of the queue. It seems
	-- a bit backwards, but gives us that O(1) complexity since it's just
	-- a list prepend.
	enqueue :: Queue a -> a -> Queue a
	enqueue (Queue xs ys) y = Queue xs (y:ys)


	-- Now dequeuing. We want to return both the element we dequeue and the
	-- remaining queue. The element to dequeue is the first element of the
	-- first list.
	dequeue :: Queue a -> (Queue a, a)
	dequeue (Queue (x:xs) ys) = (Queue xs ys, x)

	-- But we can't dequeue if the queue is empty. For now, let's just return
	-- an error. I don't like returning errors that aren't explicit, i.e., in
	-- the type system. But that's a lesson for another day.
	dequeue (Queue [] []) = error "Can't dequeue from empty queue"

	-- But what if the first list is empty but the second has elements because
	-- we've enqueue a few items? We want to start pulling elements from the
	-- second list, but they're in reverse order...
	dequeue (Queue [] ys) = dequeue $ Queue (reverse ys) []

	-- We create a new Queue with the elements of the second list reversed and
	-- put into the first position, and then recursively called dequeue on this
	-- new Queue. This will match the first function definition and give us
	-- what we intend: the oldest item dequeued and a new, 'refreshed' queue.
	--
	-- Here's a tricky question: what's the time complexity of `dequeue`? The
	-- first definition just performs `head` on a list, so it's an O(1)
	-- function. The second returns an error, also O(1). But the third uses
	-- `reverse`, which we know is a O(n) function because it copies each
	-- element to a new list[4] in reverse order.
	--
	-- So is `dequeue` O(n), since the worst-case definition is O(n)? Not
	-- quite. When evaluating complexity, you want to consider whether it's
	-- possible that an expensive operation only happens every so often. For
	-- example, is it possible to hit the last defintion of `dequeue` twice
	-- in a row? It's invoked if and only if the first list is empty and
	-- the second is non-empty, and it returns a queue where the second list is
	-- empty. Therefore, it's impossible to hit the expensive case twice. You
	-- have to dequeue all of the n elements that have been reversed and then
	-- enqueue some elements before you can reverse again. The expensive
	-- operation is absorbed across all of the cheaper operations. This process
	-- is called "amortization"[5] and are used in wide variety of data
	-- structures, especially hash maps and self-balancing trees. So, we can
	-- say that `dequeue` is an amortized O(1) function.


	-- Finally, let's reimplement your functions and and see how the function
	-- performs now! I also took the liberty to use some techniques to make
	-- your code clearn. I'll annotate the technique I used and give you a link
	-- where you can find it.

	-- Pattern matching. [6, 7]. We have to make sure to catch both cases where
	-- the last remaining element is in the first or the second list
	findWinner' :: Int -> Queue a -> a
	findWinner' _ (Queue [x] []) = x
	findWinner' _ (Queue [] [x]) = x
	findWinner' num queue = findWinner' num $ removeLoser' num queue

	-- Guards. [6, 7]
	removeLoser' :: Int -> Queue a -> Queue a
	removeLoser' num queue
	\| num == 0 = fst $ dequeue queue
	\| num > 0 = removeLoser' (num - 1) $ movePotato' queue
	\| otherwise = error "Can't remove the i-th item when i is less than 1"

	--`fst` gets the first part of the tuple returned by `dequeue`
	-- And again, I hate non-explicit errors

	-- Alright, I'm going to do something fancy here and I'll try to explain.
	-- Don't fret if it takes some mental chewing.
	movePotato' :: Queue a -> Queue a
	movePotato' = uncurry enqueue . dequeue

	-- `uncurry` takes a function a -> b -> c and changes it to (a, b) -> c
	-- `.` is the composition function. It takes two functions as arguments,
	-- b -> c and a -> b and returns a function a -> c. `g . f` is the same as
	-- applying function `f ` first and then applying `g` to that result.
	--
	-- `uncurry enqueue` returns a function (Queue a, a) -> Queue a, and
	-- dequeue returns Queue a -> (Queue a, a). Therefore, we can compose them
	-- to create a function of type Queue a -> Queue a, which removes the
	-- first item in the queue and puts it on the end of the queue.
	--
	-- Haskell is awesome!
	--
	--
	-- Alright, and for the final test...
	--
	-- Enter ':set +s' in GHCI to get timings of functions. It's not super
	-- accurate, but it's a good measure. Load this file and run the following
	-- two funcitons to get timings. The timings I got are in comments

	oldFindWinnerTest = findWinner [1..10000] 7

	-- 5687
	-- (39.70 secs, 22442495720 bytes)

	newFindWinnerTest = findWinner' 7 $ queueFromList [1..10000]

	-- 4314
	-- (0.14 secs, 51468136 bytes)

	-- and if you want to get really serious...
	hellYeahHaskell = findWinner' 7 $ queueFromList [1..100000]
	-- Fuck yeah. 1.56 secs. Ruby can't touch that shit.

	-- Timing things usually have a large margin of error, and your milage
	-- will vary (I was running Google hangouts while running this, actually.
	-- My CPU was pretty busy). But it's pretty freakin' clear that the new
	-- method works a lot better.


	-- Finally, I hate to break it to you, but your original implementation has
	-- two bugs. First, your `movePotato` method goes backwards in that it
	-- removes from the end of the queue and puts it on the beginning,
	-- violating a FIFO (first-in, first-out). You'll notice that we'll get
	-- different answers with our functions. You may have chosen to do that
	-- because that was the simplest way you could figure out how to do
	-- a queue. The way to fix it is pretty simple:

	fixedMovePotato list = tail list ++ [head list]

	-- This is still a O(n) function, though it happens to be not as bad as
	-- your original, which takes a double hit for using both `last` and
	-- `init`.
	--
	-- The second bug is an off-by one error. You should count down to 0 rather
	-- than 1 in 'removeLooser'. Using the test example from the class with
	-- 6 people and num = 5 shows you don't hit the expected answer


	-- I hope that was helpful! I spent a lot more time on this than
	-- I intended, but it was instructive for me too :) Feel free to ask me any
	-- other questions you have about haskell


	--[1]: http://en.wikipedia.org/wiki/Singly_linked_list#Singly_linked_lists
	--[2]: http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-List.html#init
	--[3]: http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
	--[4]: http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-List.html#reverse
	--[5]: http://en.wikipedia.org/wiki/Amortized_analysis
	--[6]: http://learnyouahaskell.com/syntax-in-functions
	--[7]: https://www.haskell.org/tutorial/patterns.html
No results found