Programming is about information not data, or: you might not need dependent types

When I took "Fundamentals of Computer Science" in college, my professor was very adamant about the distinction between data and information and about how data doesn't have any inherent meaning. At the time, it seemed a bit silly to me how much emphasis he put on such a seemingly insignificant difference.

In retrospect, I think he was exactly right about this and I wish more programmers took it to heart.

Data is something you can store in a computer, such as, let's say, the byte 0b01000001.

Information on the other hand is the meaning you give to this data and it is something you cannot store anywhere. Common interpretations for this byte would be the integer 65 or the ASCII letter A, but you could absolutely also interpret it as 42 or maybe a blue strawberry.

In the context of programming constructs, information primarily includes the interfaces these constructs provide. For example, an int is just represented by a bunch of bytes but the operations that your language provides over values of type int interpret it specifically as a (possibly signed) integer modulo some constant (although this doesn't mean that you need to interpret it that way).

Now, the claim I'm going to make is that when programming, data almost never actually matters and what we actually care about are almost always only those interfaces.

How is she going to make this about dependent types?

A sentence you will often hear if you are present in a certain online niche is "you cannot represent this without dependent types".

Interestingly, this isn't always true! It's just that people confuse the underlying representation (data) with the interface a type provides (information).

As an example, let's use the type of even integers {.., -4, -2, 0, 2, 4, ..}.

One obvious way to represent this in a language with refinement types would be to take the representation of regular integers and restrict it to something that doesn't allow odd numbers { z : Int | even z }.

However, there is a much simpler way to represent this type in practically any programming language!

The key is that the data representation of this type doesn't actually matter. We can represent it the same way as our regular integers and all that matters is that the operations we provide interpret it as an even integer. So if we interpret the value that corresponds to the regular integer n as the integer 2n (1 becomes 2, 2 becomes 4, etc.), we will get a type that contains exactly all the even integers!

module EvenInt : sig
    type even_int
    val add : even_int -> even_int -> even_int
    val multiply : even_int -> even_int -> even_int
    ...
end = struct
    type t = int
     
    (* 2n + 2m = 2(n + m) *)
    let add n m = n + m

    (* 2n * 2m = 2 (2 * n * m) *)
    let multiply n m = 2 * n * m

    ...
end

Similarly, a common claim is that length-indexed vectors need GADTs but this is, again, not actually true!

Because pattern matching rarely plays nicely with data abstraction, you might not be able to achieve the exact interface these usually have, but you can get something equivalent if you replace data constructors and pattern matching with functions.

type z = |
type _ s = |

module Vector : sig
    type ('n, 'a) vec

    val nil : (z, 'a) vec
    val cons : 'a -> ('n, 'a) vec -> ('n s, 'a) vec

    val uncons : (s 'n, 'a) vec -> ('a * ('n, 'a) vec)
    
    type ('a, 'r) match_cases = {
        nil : () -> 'r;
        cons : 'n. 'a -> ('n, 'a) vec -> 'r
    }
    val match_ : ('n, 'a) vec -> ('a, 'r) match_cases -> 'r
end = struct
    type ('n, 'a) vec = 'a list

    let nil = []
    let cons x xs = (x :: xs)

    let uncons = function
         | [] -> failwith "unreachable"
         | (x :: xs) -> (x, xs)

    type ('a, 'r) match_cases = {
        nil : () -> 'r;
        cons : 'n. 'a -> ('n, 'a) vec -> 'r
    }

    let match_ vec cases = match vec with
        | [] -> vec.nil ()
        | x :: xs -> vec.cons x xs
end

So the next time you find yourself claiming that something is unrepresentable, try to separate the data from the information and see if maybe you can represent it after all.

type z = | type _ s = | module Vector : sig type ('n, 'a) vec val nil : (z, 'a) vec val cons : 'a -> ('n, 'a) vec -> ('n s, 'a) vec val uncons : ('n s, 'a) vec -> ('a * ('n, 'a) vec) type ('a, 'r) match_cases = { nil : unit -> 'r; cons : 'n. 'a -> ('n, 'a) vec -> 'r } val match_ : ('n, 'a) vec -> ('a, 'r) match_cases -> 'r end = struct type ('n, 'a) vec = 'a list let nil = [] let cons x xs = (x :: xs) let uncons = function | [] -> failwith "unreachable" | (x :: xs) -> (x, xs) type ('a, 'r) match_cases = { nil : unit -> 'r; cons : 'n. 'a -> ('n, 'a) vec -> 'r } let match_ vec cases = match vec with | [] -> cases.nil () | x :: xs -> cases.cons x xs end

This all makes sense, but the trade-off it runs into is subtyping (one critical example being polymorphism).

In refinement types, an example of this is that { z : Int | even z } is a subtype of { z : Int | True }. But if you've abstracted away data representation then you have to provide your own cast EvenInt -> Int rather than the cast being checked at compile-time via refinement type checking and free at runtime.

And refinement types are just a subset of dependent types, so this is the same example in dependent types, using an inductive property Even rather than a refinement: Σ Int Even -> Int is merely a projection that you get for free from Σ, whereas the conversion EvenInt -> Int can involve arbitrary computation and has to be defined again for every different property over Int.

My point is basically that while you can abstract out the data representation of the property you're encoding, there are unique advantages to encoding information with dependent types, in particular properties over data.

Innf107/data.md

Programming is about information not data, or: you might not need dependent types

How is she going to make this about dependent types?

Miezhiko commented Sep 16, 2024

rutenkolk commented Sep 16, 2024

Innf107 commented Sep 16, 2024

MaxCan-Code commented Sep 16, 2024

jake-87 commented Sep 16, 2024

rybla commented Sep 17, 2024

LPTK commented Sep 17, 2024