Last active
September 19, 2017 23:28
-
-
Save gatlin/b0c5ed6a94831102da92 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"Church-encoded" lists and their interesting properties | |
=== | |
*This is a literate Haskell program which you can run in ghci.* | |
*Also, I already corrected a glaring problem. Let us speak no more of it.* | |
One of the beautiful things about computer science is that, fundamentally, | |
**data is code** and **code is data**. But what does that mean? And how is it | |
useful? | |
Below I implement and explain a lazy list data structure using (almost) nothing | |
but boring old functions. For the most part the actual code is straightforward. | |
*This is often called the Church encoding but this is not strictly accurate.* | |
The `L a` type | |
--- | |
> {-# LANGUAGE Rank2Types #-} | |
> import Prelude hiding ( foldr -- we will be reimplementing these, | |
> , foldl -- so I have excluded them. | |
> , map | |
> , filter | |
> , length | |
> , zip | |
> , zipWith | |
> ) | |
> newtype L a = L { foldr :: forall r. (a -> r -> r) -> r -> r } | |
This says: "a value of type `L a` wraps a strange-looking function. Applying | |
`foldr` to an `L a` value will extract the strange-looking function." | |
To get an intuition for how this is related to lists, consider: | |
data List a = Cons a (List a) | Nil | |
Then the types of the constructors would be: | |
Cons :: a -> List a -> List a | |
Nil :: List a | |
Replace `List a` with `r`: | |
Cons :: a -> r -> r | |
Nil :: r | |
Curious. So an `L` value wraps a function which accepts two functions and | |
(presumably) evaluates one or the other. These two functions are oddly | |
reminiscent of `Cons` and `Nil`. | |
How does it know, though? And why the non-specific type `r`? We'll get there. | |
Behind the magic | |
--- | |
What `newtype` has automatically generated is equivalent to the following: | |
type L a = forall r. (a -> r -> r) -> r -> r | |
L :: L a -> L a | |
L x = x | |
foldr :: L a -> (a -> r -> r) -> r -> r | |
foldr xs op x = xs op x | |
Thus `L` is a narrowly-typed identity function and `foldr` is a | |
narrowly-typed application function. Composing them together in expressions | |
shouldn't affect program semantics at all but it will give you guarantees about | |
type safety. This composition also opens the door to a very special | |
optimization. | |
But first let's play with them a bit. | |
List construction: `cons` and `nil` | |
--- | |
> cons :: a -> L a -> L a | |
> cons x xs = L $ \c e -> c x (foldr xs c e) | |
Fully evaluated, this is what a cons looks like: | |
cons 1 nil => | |
L (\c e -> c 1 (foldr (L (\c e -> e)) c e)) => | |
> nil :: L a | |
> nil = L $ \c e -> e | |
Example: | |
> list1 :: L Int | |
> list1 = cons 1 $ cons 2 $ cons 3 $ cons 4 $ cons 5 $ cons 6 nil | |
Opening up a list | |
--- | |
It's all well and good to construct lists but equally, if not more, important | |
is the ability to extract data from a list. | |
Just as we construct a list by prepending a head to a tail, we will first | |
write a function to split a list into its head and tail: | |
> split :: L a -> (Maybe a, L a) | |
> split xs = foldr xs f (Nothing, nil) where | |
> f y ys = (Just y, L (\c e -> | |
> maybe e (\x -> c x (foldr (snd ys) c e)) | |
> (fst ys))) | |
This works because `foldr`'s job is to tear down a list and transform it into | |
some other type. In this case, we want to construct a tuple of `Maybe a` and | |
`List a`. The last argument of `foldr` is the value to use if the list is | |
empty. | |
Otherwise we pass a function which will receive two values. One is indeed the | |
head of the list, but the other is not quite what you might first expect. It | |
may seem like something out of *Bill and Ted's Excellent Adventure* but since | |
this operation is recursive, you must assume that the rest of the list has been | |
taken care of. **Exercise: why?** Thus, `ys` is already an `(Maybe a, List a)` | |
value. | |
In Lisp languages, the traditional name for the function which extracts the | |
first element of a list is called `car`. Since this doesn't clash with any | |
Haskell names it's what I'll name the next function: | |
> car :: L a -> a | |
> car xs = maybe diverge id (fst $ split xs) where | |
> fix f = f (fix f) | |
> diverge = fix id | |
`fix` computes the *fixpoint* of a function; that is, the value `x` for which | |
`f x == x`. Diverge computes the fixpoint of the `id` function; the upshot is | |
that `diverge` is a value which can take any type. `maybe` demands that we | |
supply a final argument but if the list is non-empty we will never actually | |
need the last element. | |
Keeping with our naming scheme, the function to produce the remainder of the | |
list is `cdr`: | |
> cdr :: L a -> L a | |
> cdr = snd . split | |
For giggles, let's expose a "safe" `car` variant: one which returns `Just` an | |
element if the list has an element, or `Nothing` otherwise: | |
> car' :: L a -> Maybe a | |
> car' = fst . split | |
Demonstration: a `sum` function | |
--- | |
Let's get a feel for how this all works by writing a function to sum a | |
list of numbers. | |
> sum' xs = foldl xs (+) 0 | |
Evaluated on (cons 1 (cons 2 nil)): | |
sum' (cons 1 (cons 2 nil)) => | |
foldr (cons 1 (cons 2 nil)) (+) 0 => | |
foldr (L (\c e -> c 1 (foldr (cons 2 nil) c e))) (+) 0 => | |
(\c e -> c 1 (foldr (cons 2 nil) c e)) (+) 0 => | |
(+) 1 ((foldr (cons 2 nil) (+) 0)) => | |
(+) 1 ((foldr (L (\c e -> c 2 (foldr nil c e))) (+) 0)) => | |
(+) 1 ((\c e -> c 2 (foldr nil c e)) (+) 0) => | |
(+) 1 ((+) 2 (foldr nil (+) 0)) => | |
(+) 1 ((+) 2 (foldr (L (\c e -> e)) (+) 0)) => | |
(+) 1 ((+) 2 ((\c e -> e) (+) 0)) => | |
(+) 1 ((+) 2 0) => | |
(+) 1 2 => | |
3 | |
There is an important symmetry to understand here: lists are an inductive | |
type, and so are operations on lists. When performing some computation on a | |
list, you must supply a base case (analogous to `Nil`) and an inductive case | |
(analogous to `Cons`). | |
The naive encoding of the data type creates an object in memory, a *thing*, and | |
builds up a structure. The functional encoding, however, simply provides | |
possible execution paths. `L a` values are control flow operators, deciding at | |
every element of the list whether to continue or halt. | |
Thus, data is code and code is data. | |
Fusion | |
--- | |
As for the optimization, we know that when we see `foldr (L f) x y` we can | |
easily rewrite it as `f x y` before we ever generate any code. `foldr` and `L` | |
disappear when composed correctly. This is called *fusion*. Haskell provides a | |
system to tell the compiler about such pairs of functions and this can result | |
in much more efficient programs. | |
Because ultimately you're not *doing* anything other than proving properties of | |
your program. These proofs can be discarded once verified. | |
Other examples | |
--- | |
For completeness here are a few other traditional list functions implemented in | |
our scheme. As an exercise, try to figure out how and why they work. | |
> foldl :: L a -> (r -> a -> r) -> r -> r | |
> foldl bs f a = foldr bs (\b g x -> g (f x b)) id a | |
> map :: (a -> b) -> L a -> L b | |
> map f xs = foldr xs (\y ys -> cons (f y) ys) nil | |
> filter :: (a -> Bool) -> L a -> L a | |
> filter pred xs = foldr xs f nil where | |
> f y ys = L $ \c e -> if (pred y) | |
> then c y (foldr ys c e) | |
> else foldr ys c e | |
> length :: L a -> Int | |
> length xs = foldr xs (\_ -> (+1)) 0 | |
And here are a few more example uses: | |
> isEven :: (Integral n, Eq n, Num n) => n -> Bool | |
> isEven x = x `mod` 2 == 0 | |
> sum_1 = sum' list1 -- => 21 | |
> sum_2 = sum' . map (*2) $ list1 -- => 42 | |
> sum_3 = sum' . map (*2) . cdr $ list1 -- => 40 | |
> sum_4 = sum' . filter isEven $ list1 -- => 12 | |
A note on recursion. | |
--- | |
One very interesting property of these functions is that none of them are | |
recursive. Yet, we define maps, filters, and folds which necessarily iterate | |
the items of a list! | |
Instead, these functions are *mutually recursive*. No single function calls | |
itself directly, but different compositions of the basic functions cause | |
program execution to loop as long as necessary. In this way the very concepts | |
of iteration and recursion are defined in terms of list processing functions. | |
And with tail call optimization and sibling call optimization, the compiler can | |
generate machine code which takes up constant space. | |
Now if only there were some language dedicated to LISt Processing, I | |
bet it would be pretty powerful ... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment