Some of the most important advancements in programming came from adding restrictions to the way we program.
Famous article by Edsger Dijkstra Go To Statement Considered Harmful
shows how introducing one statement into programming language breaks many nice properties of that language.
goto
gives us freedom and solves couple of simple problems like breaking from nested loops easily,
but it can make programs very hard to understand.
This happens, because we cannot look at two pieces of code in spearation.
The execution can jump to some other place in code or even worse: it can jump from other piece of code to here and We need to keep that in mind.
It is very similar with global variables. We know we shouldn't use them,
because it tightly couples every piece of code that uses it.
Lets say you wrote a function that needed to compute something using global variable without modifying it,
but modified it by accident.
It is possible that bug in your program will manifest in completely different piece of code and that buggy function would appear to work fine in tests.
It is the same problem design problems as with goto
, we cannot look at two pieces of code in separation.
Global state is considered harmful.
Lets make a bold move! Lets consider all state harmful!
Pure functions by definition always return the same value for given input. For example:
def inc(x), do: x + 1
Pure functions can't depend on or modify application state. For example this is not a pure function:
def print_value(x), do: IO.inspect(x+1)
Given the same argument, it will always return the same value, but it also has a side effect of printing that value. Pure functions have couple of really nice properties:
We can read pure function in isolation from rest of the code. There is no external state we need to worry about. No other piece of code can influence return value of pure function.
Programming by changing state is called imperative programming and it is often error prone. For example it is common to first create an element and operate on it. It is hard to do that in functional language, so my next example will be in JavaScript Lets say we created a piece of HTML:
<p class="blue_box">Some text</p>
<a href="#">Link</a>
And we need to make the background of that paragraph blue using the class:
$(".blue_box").css('background-color', 'blue');
Later we decide we want all links to be blue so in some other place we add this:
$("a").css('color', 'blue');
At the end someone modifies original html like this:
<p class="blue_box"><a href="#">Link</a>Some text</p>
and the link disappeared, because it is blue text on blue background. Such situations remind me of this picture: https://pbs.twimg.com/media/CZfyeKnWYAAxQbr.png Stuff appears to be working fine in isolation, but not when combined. Where should we even fix the bug?
This is very common and it happens, because imperative code describes steps to achieve result and not intention. In functional programming we would need to know both colors in advance when we create html, so we would need a function that computes them. Colors of element depend on elements type and class and we need to compute both foreground and background. Lets say that as before we first want to apply blue background only.
@spec colors(class :: string) -> background_color
def colors("blue_box") -> "blue"
def colors(_) -> "white"
We use pattern matching on function head to determine background color. Underscore is a "catch all", so this code works like this:
- first clause tells us that in "blue_box" class everything has a blue background so links won't be an exception.
- and by default everythig has a white background
Next someone told us to make all links blue, so our function depends now on both class and type:
@spec colors(type :: string, class :: string) -> {foreground_color, background_color}
def colors(_, "blue_box") -> {"black", "blue"}
def colors("a", _) -> {"blue", "white"}
def colors(_) -> {"black, "white"}
Now it works like this:
- first clause says that if something is in "blue_box" class it will be black on blue (with no exception to links)
- second clause says that in other cases links will be blue on white background
- and everything else will be black on white background
The cool thing is that pure code forced us to come up with solution to a blue link in blue box problem before we even thought about it. Of course, we could solve in some other way by adding other clause in the beginning:
def colors("a", "blue_box") -> {"yellow", "blue"}
The important thing is that even though it required more code, it is more explicit, simpler and less error prone.
Without any external state we can test our programs using only assertions:
assert return_value = function(arguments)
There is no need for mocking libraries or setting up proper state with before
hooks.
It is just functions and data.
Programmers often think about data structures as something that is created to change over time, so it may be counterintuitive to discover that purely functional data structures are immutable. Once created, they cannot change. If you need slightly modified verion of the same structure you need to create new one. However, it is often optimized to be very fast.
Lets take a list as an example. Elixir list is just a singly linked list under the hood. If you declare a list like this:
a = [1, 2, 3]
The memory layout looks like this:
a → 1 → 2 → 3 → NULL
This can never change. After last reference to this list is lost we can discard it entierly, but we can never modify it. What happens if we want to prepend something to that list?
b = [0 | a]
It creates a new list:
a → 1 → 2 → 3 → NULL
b → 0 ↗
So we just have to craete one element in memory and point to the rest of previous list.
Immutability guarantees list a
will never change, so we don't have to worry about some other part of code mutating it.
This allows for very efficient O(1) prepend operation and is thread safe.
Many processes can read the same piece of data without any need for locks or mutexes, because noone is going to change it.
On the other hand, if we wanted to append at the end of list, we have to copy entire list O(n), because we are not allowed to just change last element:
c = a ++ [4]
a → 1 → 2 → 3 → NULL
c → 1 → 2 → 3 → 4 → NULL
We have to remember that using purely functional data structures is a tradeoff. We gain thread safety and easier to maintain programs, but some operations are slower than in imperative programs.
The only problem with programs that use only pure functions and purely functional data structures is that they are useless :) We can't read user input, because it is a side effect. Even if we encode all input paramaters directly in source code, we can't print to screen or write to database, because it is also a side effect. Even if our program returns some value, we won't know about it.
There are couple of solutions to this problem. Haskell uses Monads for all operations that need side effects. It is a trick that conceptually takes entire outside world or IO system as a function argument and adds modified world as a return value. This way you can include information about all side effects that given function will perform in function type.
So purely functional languages are languages that mainly use pure functions and data structures and mark all non-pure parts explicitely as a danger zone.
Many libraries in Elixir do similar thing.
For example Ecto
database wrapper has a concept of Multi
which aggregates operations on database.
Instead of scattering your functions operating on database throughout your codebase
we can use Ecto.Multi
which is a data structure that encodes multiple db operations.
Without Multi
all functions in example below are impure:
def logic1(), do: add_to_db(...)
def logic2(), do: update_db(...)
def logic3(), do: delete_from_db(...)
def do_stuff() do
logic1
logic2
logic3
end
With Multi
we can rewrite it like this:
def logic1(multi), do: Multi.insert(multi, ...)
def logic2(multi), do: Multi.update(multi, ...)
def logic3(multi), do: Multi.delete(multi, ...)
def do_stuff() do
Multi.new #start with empty multi
|> logic1 #returns multi with insert
|> logic2 #returns multi with insert and update
|> logic3 #returns multi with all three operations
|> Repo.transaction #performs all three operations
end
This way all 3 logic functions become pure. We can test them in isolation and we don't need to mock anything for those three.
We still need to setup the database to test the high level do_stuff
function,
but we can treat it more like an integration test.
We've seen that pure functions are easier to test and make programs easier to understand.
Purely functional data structures give similar advantages, can add some clever optimisations and are thread safe.
We can write purely functional programs by encoding impure operations as data and then having one point in our program that deals with side effects like in example with Ecto.Multi
.
Pushing impure code to the boundaries of our program requires some discipline,
but constraining ourselves in this way is compensated by better design that emerges.
It is very similar to constraining ourselves by not using goto
or global variables.
Even non functional languages adopt some parts of this design.
Ruby introduced lately immutable strings.
React and many other JS frameworks try to push all functions updating DOM model out of your logic by introducing virtual DOM.
I hope this article is going to help people starting with functional programming dispel any doubts about meaning of purity.