Skip to content

Instantly share code, notes, and snippets.

@porglezomp
Last active August 13, 2024 22:22
Show Gist options
  • Save porglezomp/e941da088e462bc76a5080a46f904834 to your computer and use it in GitHub Desktop.
Save porglezomp/e941da088e462bc76a5080a46f904834 to your computer and use it in GitHub Desktop.
Taking on Hillel's challenge to formally verify leftpad (https://twitter.com/Hillelogram/status/987432181889994759)
-- Note: There is a more complete explanation at https://github.com/hwayne/lets-prove-leftpad/tree/master/idris
import Data.Vect
-- `minus` is saturating subtraction, so this works like we want it to
eq_max : (n, k : Nat) -> maximum k n = plus (n `minus` k) k
eq_max n Z = rewrite minusZeroRight n in rewrite plusZeroRightNeutral n in Refl
eq_max Z (S _) = Refl
eq_max (S n) (S k) = rewrite sym $ plusSuccRightSucc (n `minus` k) k in rewrite eq_max n k in Refl
-- The type here says "the result is" padded to (maximum k n), and is padding plus the original
leftPad : (x : a) -> (n : Nat) -> (xs : Vect k a)
-> (ys : Vect (maximum k n) a ** m : Nat ** ys = replicate m x ++ xs)
leftPad {k} x n xs = rewrite eq_max n k in (replicate (n `minus` k) x ++ xs ** n `minus` k ** Refl)
-- We can skip the "existential" for the padding size and inline the n `minus` k... is this better?
leftPad' : (x : a) -> (n : Nat) -> (xs : Vect k a)
-> (ys : Vect (maximum k n) a ** ys = replicate (n `minus` k) x ++ xs)
leftPad' {k} x n xs = rewrite eq_max n k in (replicate (n `minus` k) x ++ xs ** Refl)

Note: There is a more complete explanation at https://github.com/hwayne/lets-prove-leftpad/tree/master/idris

import Data.Vect

We define this eq_max theorem, since it's the core part of how the nice implementations of leftpad work: You subtract your existing length from your padding, and then add it back on, and since minus here is saturating subtraction, this does what we want and gives the maximum of the two.

For people not familiar, implementing a recursive function with a type like this is providing an inductive proof of a theorem. A function that takes some arguments is saying "forall," and then we're proving the equality in the result type.

eq_max : (n, k : Nat) -> maximum k n = plus (n `minus` k) k

The cases for n Z and Z (S _) are able to be directly proved, they're the base cases for our recursion.

eq_max  Z    (S _) = Refl

Every time we say rewrite _ in _, we're modifying the type of the value in the second position using an equality in the first position. For example, if you wrote a function that doubled the length of a vector, you might use it to prove that Vect (n + n) a (from appending) is the same as a Vect (n * 2) a. In this case we use it to explicitly cancel the 0 in the type of the first base case, since Idris doesn't do that for you.

eq_max  n     Z    = rewrite minusZeroRight n in rewrite plusZeroRightNeutral n in Refl

The final case is the inductive step. We use our rewrites to pull the S out of the equation, and then we're able to prove it by the inductive hypothesis (the recursive case).

eq_max (S n) (S k) = rewrite sym $ plusSuccRightSucc (n `minus` k) k in rewrite eq_max n k in Refl

Now that we've got our eq_max theorem, we can actually define our program. We want to implement leftpad. What does leftpad do? Well, we take a list (xs), a padding value (x), and a length (n). If the list is smaller than the padding value, then we repeat the padding value in front of the list until it's long enough. Symbolically, we might say that there's some number m such that we repeat x m times, and prepend that to list, and then the result is max (length xs) n.

If we want to make sure that we're implementing this correctly, one way to do this in Idris is return the result as well as a proof that it matches the spec. The tool to use is a "dependent pair," written (x : T ** P x), which says that we're returning a value of type T named x, and also that the fact P x is true. (Recall, theorems are just types, and proofs are just values, so we give a proof in the second position of the pair.) We want to say, we we start with a list xs of length k, which we write as xs : Vect k a, a value x, and a length n, and return a result ys : Vect (maximum k n) a, just like we said above in the spec. Further extending the spec, we want to say that there's some number m such that ys = replicate m x ++ xs, also a direct translation of our spec. Given this type, we can just write the same code we'd write in Haskell, and put the proof arguments after it. We also need to rewrite with the eq_max we made earlier to make sure that the maximum k n in the type matches the n `minus` k + k that we get from appending the lists in the implementation.

leftPad : (x : a) -> (n : Nat) -> (xs : Vect k a)
       -> (ys : Vect (maximum k n) a ** m : Nat ** ys = replicate m x ++ xs)
leftPad {k} x n xs = rewrite eq_max n k in (replicate (n `minus` k) x ++ xs ** n `minus` k ** Refl)

We can skip the "existential" for the padding size and inline the n `minus` k... I'm not sure if this is better, but it's slightly smaller!

leftPad' : (x : a) -> (n : Nat) -> (xs : Vect k a)
        -> (ys : Vect (maximum k n) a ** ys = replicate (n `minus` k) x ++ xs)
leftPad' {k} x n xs = rewrite eq_max n k in (replicate (n `minus` k) x ++ xs ** Refl)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment