gclaramunt · May 15, 2013 14:50
diff --git a/gistfile1.txt b/gistfile1.txt
 I've been pondering this myself the last few days, and I'd like to go ahead and amplify this. Don't worry about leaping to the "Haskelly" solution at first. Start by just bashing something out that essentially works. Go nuts. Do it all in IO. Write partial functions. Copy and paste without fear. Use undefined to fill in gaps in the functions. Realize halfway through you're sort of doing it wrong, but don't stop to correct it, just keep going straight for something that works for some task. Not even the whole task, just something.
 (Do avoid mutation, though, that will wreck up the rest of this process. Stay out of IORef, and for the moment I'd even suggest avoiding Control.Monad.State, although that's much less bad.)
 You will quite likely look at the resulting code, and wonder why you bothered doing it in Haskell. It looks like crap, was harder to write than whatever language you are comfortable in (for the compiler was probably still pretty cranky at you even so), and looks nothing like the beautiful code you see in blog posts. Don't sweat it right now. Make it compile and make it pass some basic test case that you can easily replicate, ideally in GHCi, as some sort of unit test if not.
 The key is to not stop here. First, even if this is a tiny little project of your own doing that will go nowhere, commit it to some source control. You need to be able roll back for what I'm about to say, and you're going to need frequent checkpoints. If nothing else, you can use git init. git add $FILE, git commit, and git checkout $FILE to roll back will be enough to get going, and for all I care call every commit "Checkpoint.", since you'll only ever need to roll back to the previous commit, most likely.
 Next step: Run hlint over your code. Do almost everything or everything it says. It's simple, but it's a great step, and already you'll start learning some things.
 Next, iterate roughly through the following things:
 Micro-scale refactorings. Find anything you can that is the least bit repeated, even just twice, and figure out how to factor it out. Haskell's just about the best language there is at this sort of thing. If you've got a five parameter function that you often call with three of the same parameters, factor the similarities out. If you've got a function that you're continuously calling "flip" on, reverse the parameter order and remove the flip. Save after every thing you do, and see if it still compiles and passes the test. Commit frequently. Look at all your lambda functions; is there a way to write them without the lambda just using currying? (Sometimes they really are the best solution, but often they can be turned into usage of curried functions.) Did you write an explicit recursion? Can it be turned into a fold or something? You'll find these also feed on each other, which is why I suggest breaking the "Rule of Three" and refactoring even the second time you see some similarity; you'll often discover that what you thought you did only twice you actually did even more often, it's just that there were parameters in the way that prevented it from being obvious.
 Pull out pure functionality: Anywhere you do IO, then do "something" to the result, pull the something into its own function. If you get a value from IO, then do something, then get another value from IO, then do something with that, then get another value from IO, and produce a final result, reorder it so it does all three IO operations in a row and passes it to a pure function.
 Look carefully at your data types. Do you have some data type that contains a value that could easily be computed from the other values in the data type? Eliminate that field, replace it with a function. Do you have three data types with the same set of three fields in them? Pull that out into a new data declaration.
 Now the mystical bit: Listen to your code. It's trying to tell you things. Do you have partial functions? Figure out how to unpartial them. Does it require you to rewrite half your code? As a learning exercise, go ahead and do it. It sucks at first, but it's the best way to learn how to do it right next time. You'll almost always find the result is surprisingly better. Use more Maybe. Use Data.Maybe's shortcuts and the monad/applicative interface. Do you have incomplete case statements? Look carefully at the missing clauses, and think about what it means if you hit them. Perhaps the thing you are switching on is actually two bits of data you've jammed together accidentally. Look for bits of code where you have to usually call f x y, but sometimes have to call f' x y z in one case clause; is there a way you can unify those functions into one case overall?
 Also, learn from your errors. Did you cross two Int types accidentally? Newtype yourself up some new types so you can't. Make your errors impossible, if you can. (This one does take some time to learn, I think.)
 Run hlint again. It's easy. By the time you've passed through all of the above you've probably got some things you can write in point-free style now, and when you do that yourself it makes a lot more sense than when you just read about it in a blog post.
 Look at some of the functions you've factored out, and look at your most generic ones (type signatures with just as and bs). Odds are good you've reimplemented something that already exists. If you've got some function with really generic signatures, pop them into Hoogle and see what pops out. Data.Maybe, Control.Monad, and Prelude are full of things you probably don't get until you need them. I stared at Control.Monad.>=> for a while without getting it until I finally ended up reimplementing it myself one day. Now I get it.
 Once that stabilizes, you can move into macro-scale refactorings. Why? Because already you've probably watched your code collapse quite significantly. Now you're trying to macro-scale refactor something that's already simpler than it was before. You've brushed away enough of the accident that the essence is that much clearer.
 Do a huge number of your functions take the same set of parameters? First, pull that out into a new data type, and then consider a reader monad. Do a huge number of your functions take an X and return a modified X? Consider state. For now, try to avoid "stacks" of monads. They're not bad, but you should use them because you find that it makes your code cleaner at the end of this process, not because you start with one.
 Having done that, take a pass back through the bulleted list.
 Finally, depending on your task, consider the use of some of the libraries you've read about on /r/haskell. Are continuously recomputing some values from old values? Consider lens. (And don't forget lenses are themselves first class values; a function can easily take "a thing to do" and "a lens to do it on". This is where it shines and this can once again collapse code in surprising ways!) Do you have a pipeline? Consider pipes. Etc. (This depends too much on exactly what you are doing to have general advice.)
 At the end of this process, step back and compare your final code to your first working model. Odds are your final code is really starting to resemble the Haskell code you've seen online! Odds are you've got something where you can clearly say "Here is the clear specification of the problem that can almost be read out loud as the solution, and here are the functions supporting the specification, and there's no confusion which is which." Odds are you've got something you can post to a blog yourself, and drive the next newbie batty because they don't know how to write code like that.
 It's just too much pressure to think you need to leap to pristine code without any intermediate steps. Don't be paralyzed by that. Maybe the experts can leap straight to the pristine answer, but then, they had to hone their skills somehow too. It would be interesting to hear if some of the experts still essentially work this way, or what kind of intermediate steps they pass through themselves. Haskell is the easiest language I know to do safe microrefactorings by hand in. Take full advantage of that. It gets easier as you go along.
 (That got a bit larger than I expected, but it seems like it covers some useful middle ground that is rarely talked about.)
	I've been pondering this myself the last few days, and I'd like to go ahead and amplify this. Don't worry about leaping to the "Haskelly" solution at first. Start by just bashing something out that essentially works. Go nuts. Do it all in IO. Write partial functions. Copy and paste without fear. Use undefined to fill in gaps in the functions. Realize halfway through you're sort of doing it wrong, but don't stop to correct it, just keep going straight for something that works for some task. Not even the whole task, just something.
	(Do avoid mutation, though, that will wreck up the rest of this process. Stay out of IORef, and for the moment I'd even suggest avoiding Control.Monad.State, although that's much less bad.)
	You will quite likely look at the resulting code, and wonder why you bothered doing it in Haskell. It looks like crap, was harder to write than whatever language you are comfortable in (for the compiler was probably still pretty cranky at you even so), and looks nothing like the beautiful code you see in blog posts. Don't sweat it right now. Make it compile and make it pass some basic test case that you can easily replicate, ideally in GHCi, as some sort of unit test if not.
	The key is to not stop here. First, even if this is a tiny little project of your own doing that will go nowhere, commit it to some source control. You need to be able roll back for what I'm about to say, and you're going to need frequent checkpoints. If nothing else, you can use git init. git add $FILE, git commit, and git checkout $FILE to roll back will be enough to get going, and for all I care call every commit "Checkpoint.", since you'll only ever need to roll back to the previous commit, most likely.
	Next step: Run hlint over your code. Do almost everything or everything it says. It's simple, but it's a great step, and already you'll start learning some things.
	Next, iterate roughly through the following things:
	Micro-scale refactorings. Find anything you can that is the least bit repeated, even just twice, and figure out how to factor it out. Haskell's just about the best language there is at this sort of thing. If you've got a five parameter function that you often call with three of the same parameters, factor the similarities out. If you've got a function that you're continuously calling "flip" on, reverse the parameter order and remove the flip. Save after every thing you do, and see if it still compiles and passes the test. Commit frequently. Look at all your lambda functions; is there a way to write them without the lambda just using currying? (Sometimes they really are the best solution, but often they can be turned into usage of curried functions.) Did you write an explicit recursion? Can it be turned into a fold or something? You'll find these also feed on each other, which is why I suggest breaking the "Rule of Three" and refactoring even the second time you see some similarity; you'll often discover that what you thought you did only twice you actually did even more often, it's just that there were parameters in the way that prevented it from being obvious.
	Pull out pure functionality: Anywhere you do IO, then do "something" to the result, pull the something into its own function. If you get a value from IO, then do something, then get another value from IO, then do something with that, then get another value from IO, and produce a final result, reorder it so it does all three IO operations in a row and passes it to a pure function.
	Look carefully at your data types. Do you have some data type that contains a value that could easily be computed from the other values in the data type? Eliminate that field, replace it with a function. Do you have three data types with the same set of three fields in them? Pull that out into a new data declaration.
	Now the mystical bit: Listen to your code. It's trying to tell you things. Do you have partial functions? Figure out how to unpartial them. Does it require you to rewrite half your code? As a learning exercise, go ahead and do it. It sucks at first, but it's the best way to learn how to do it right next time. You'll almost always find the result is surprisingly better. Use more Maybe. Use Data.Maybe's shortcuts and the monad/applicative interface. Do you have incomplete case statements? Look carefully at the missing clauses, and think about what it means if you hit them. Perhaps the thing you are switching on is actually two bits of data you've jammed together accidentally. Look for bits of code where you have to usually call f x y, but sometimes have to call f' x y z in one case clause; is there a way you can unify those functions into one case overall?
	Also, learn from your errors. Did you cross two Int types accidentally? Newtype yourself up some new types so you can't. Make your errors impossible, if you can. (This one does take some time to learn, I think.)
	Run hlint again. It's easy. By the time you've passed through all of the above you've probably got some things you can write in point-free style now, and when you do that yourself it makes a lot more sense than when you just read about it in a blog post.
	Look at some of the functions you've factored out, and look at your most generic ones (type signatures with just as and bs). Odds are good you've reimplemented something that already exists. If you've got some function with really generic signatures, pop them into Hoogle and see what pops out. Data.Maybe, Control.Monad, and Prelude are full of things you probably don't get until you need them. I stared at Control.Monad.>=> for a while without getting it until I finally ended up reimplementing it myself one day. Now I get it.
	Once that stabilizes, you can move into macro-scale refactorings. Why? Because already you've probably watched your code collapse quite significantly. Now you're trying to macro-scale refactor something that's already simpler than it was before. You've brushed away enough of the accident that the essence is that much clearer.
	Do a huge number of your functions take the same set of parameters? First, pull that out into a new data type, and then consider a reader monad. Do a huge number of your functions take an X and return a modified X? Consider state. For now, try to avoid "stacks" of monads. They're not bad, but you should use them because you find that it makes your code cleaner at the end of this process, not because you start with one.
	Having done that, take a pass back through the bulleted list.
	Finally, depending on your task, consider the use of some of the libraries you've read about on /r/haskell. Are continuously recomputing some values from old values? Consider lens. (And don't forget lenses are themselves first class values; a function can easily take "a thing to do" and "a lens to do it on". This is where it shines and this can once again collapse code in surprising ways!) Do you have a pipeline? Consider pipes. Etc. (This depends too much on exactly what you are doing to have general advice.)
	At the end of this process, step back and compare your final code to your first working model. Odds are your final code is really starting to resemble the Haskell code you've seen online! Odds are you've got something where you can clearly say "Here is the clear specification of the problem that can almost be read out loud as the solution, and here are the functions supporting the specification, and there's no confusion which is which." Odds are you've got something you can post to a blog yourself, and drive the next newbie batty because they don't know how to write code like that.
	It's just too much pressure to think you need to leap to pristine code without any intermediate steps. Don't be paralyzed by that. Maybe the experts can leap straight to the pristine answer, but then, they had to hone their skills somehow too. It would be interesting to hear if some of the experts still essentially work this way, or what kind of intermediate steps they pass through themselves. Haskell is the easiest language I know to do safe microrefactorings by hand in. Take full advantage of that. It gets easier as you go along.
	(That got a bit larger than I expected, but it seems like it covers some useful middle ground that is rarely talked about.)