Skip to content

Instantly share code, notes, and snippets.

@Innf107
Last active October 15, 2024 07:10
Show Gist options
  • Save Innf107/15ba3a1f95534a00177189bfb49d3c92 to your computer and use it in GitHub Desktop.
Save Innf107/15ba3a1f95534a00177189bfb49d3c92 to your computer and use it in GitHub Desktop.
Programming is about information not data, or: you might not need dependent types

Programming is about information not data, or: you might not need dependent types

When I took "Fundamentals of Computer Science" in college, my professor was very adamant about the distinction between data and information and about how data doesn't have any inherent meaning. At the time, it seemed a bit silly to me how much emphasis he put on such a seemingly insignificant difference.

In retrospect, I think he was exactly right about this and I wish more programmers took it to heart.

Data is something you can store in a computer, such as, let's say, the byte 0b01000001.

Information on the other hand is the meaning you give to this data and it is something you cannot store anywhere. Common interpretations for this byte would be the integer 65 or the ASCII letter A, but you could absolutely also interpret it as 42 or maybe a blue strawberry.

In the context of programming constructs, information primarily includes the interfaces these constructs provide. For example, an int is just represented by a bunch of bytes but the operations that your language provides over values of type int interpret it specifically as a (possibly signed) integer modulo some constant (although this doesn't mean that you need to interpret it that way).

Now, the claim I'm going to make is that when programming, data almost never actually matters and what we actually care about are almost always only those interfaces.

How is she going to make this about dependent types?

A sentence you will often hear if you are present in a certain online niche is "you cannot represent this without dependent types".

Interestingly, this isn't always true! It's just that people confuse the underlying representation (data) with the interface a type provides (information).

As an example, let's use the type of even integers {.., -4, -2, 0, 2, 4, ..}.

One obvious way to represent this in a language with refinement types would be to take the representation of regular integers and restrict it to something that doesn't allow odd numbers { z : Int | even z }.

However, there is a much simpler way to represent this type in practically any programming language!

The key is that the data representation of this type doesn't actually matter. We can represent it the same way as our regular integers and all that matters is that the operations we provide interpret it as an even integer. So if we interpret the value that corresponds to the regular integer n as the integer 2n (1 becomes 2, 2 becomes 4, etc.), we will get a type that contains exactly all the even integers!

module EvenInt : sig
    type even_int
    val add : even_int -> even_int -> even_int
    val multiply : even_int -> even_int -> even_int
    ...
end = struct
    type t = int
     
    (* 2n + 2m = 2(n + m) *)
    let add n m = n + m

    (* 2n * 2m = 2 (2 * n * m) *)
    let multiply n m = 2 * n * m

    ...
end

Similarly, a common claim is that length-indexed vectors need GADTs but this is, again, not actually true!

Because pattern matching rarely plays nicely with data abstraction, you might not be able to achieve the exact interface these usually have, but you can get something equivalent if you replace data constructors and pattern matching with functions.

type z = |
type _ s = |

module Vector : sig
    type ('n, 'a) vec

    val nil : (z, 'a) vec
    val cons : 'a -> ('n, 'a) vec -> ('n s, 'a) vec

    val uncons : (s 'n, 'a) vec -> ('a * ('n, 'a) vec)
    
    type ('a, 'r) match_cases = {
        nil : () -> 'r;
        cons : 'n. 'a -> ('n, 'a) vec -> 'r
    }
    val match_ : ('n, 'a) vec -> ('a, 'r) match_cases -> 'r
end = struct
    type ('n, 'a) vec = 'a list

    let nil = []
    let cons x xs = (x :: xs)

    let uncons = function
         | [] -> failwith "unreachable"
         | (x :: xs) -> (x, xs)

    type ('a, 'r) match_cases = {
        nil : () -> 'r;
        cons : 'n. 'a -> ('n, 'a) vec -> 'r
    }

    let match_ vec cases = match vec with
        | [] -> vec.nil ()
        | x :: xs -> vec.cons x xs
end

So the next time you find yourself claiming that something is unrepresentable, try to separate the data from the information and see if maybe you can represent it after all.

@Miezhiko
Copy link

better than cohost ngl

@rutenkolk
Copy link

there is a small typo in "val multiply : even_int -> evem_int -> even_int" (emphasis mine)

@Innf107
Copy link
Author

Innf107 commented Sep 16, 2024

there is a small typo in "val multiply : even_int -> evem_int -> even_int" (emphasis mine)

that's what i get for posting code without running it ^^ thanks!

@MaxCan-Code
Copy link

Anthony Bordg and Adrián Doña Mateo. 2023. Encoding Dependently-Typed Constructions into Simple Type Theory
https://doi.org/10.1145/3573105.3575679
https://youtu.be/fU8Z-3FI2jg

@jake-87
Copy link

jake-87 commented Sep 16, 2024

i’m so sorry to do this but your last vec example has a few issues
here’s a working version that I believe represents what you wanted:

type z = |
type _ s = |

module Vector : sig
    type ('n, 'a) vec

    val nil : (z, 'a) vec
    val cons : 'a -> ('n, 'a) vec -> ('n s, 'a) vec

    val uncons : ('n s, 'a) vec -> ('a * ('n, 'a) vec)
    
    type ('a, 'r) match_cases = {
        nil : unit -> 'r;
        cons : 'n. 'a -> ('n, 'a) vec -> 'r
    }
    val match_ : ('n, 'a) vec -> ('a, 'r) match_cases -> 'r
end = struct
    type ('n, 'a) vec = 'a list

    let nil = []
    let cons x xs = (x :: xs)

    let uncons = function
         | [] -> failwith "unreachable"
         | (x :: xs) -> (x, xs)

    type ('a, 'r) match_cases = {
        nil : unit -> 'r;
        cons : 'n. 'a -> ('n, 'a) vec -> 'r
    }

    let match_ vec cases = match vec with
        | [] -> cases.nil ()
        | x :: xs -> cases.cons x xs
end

changelog:

  • 'n s instead of s 'n in one place
  • unit instead of ()
  • fixed the last match example (was field accessing the wrong var)

@rybla
Copy link

rybla commented Sep 17, 2024

This all makes sense, but the trade-off it runs into is subtyping (one critical example being polymorphism).

In refinement types, an example of this is that { z : Int | even z } is a subtype of { z : Int | True }. But if you've abstracted away data representation then you have to provide your own cast EvenInt -> Int rather than the cast being checked at compile-time via refinement type checking and free at runtime.

And refinement types are just a subset of dependent types, so this is the same example in dependent types, using an inductive property Even rather than a refinement: Σ Int Even -> Int is merely a projection that you get for free from Σ, whereas the conversion EvenInt -> Int can involve arbitrary computation and has to be defined again for every different property over Int.

My point is basically that while you can abstract out the data representation of the property you're encoding, there are unique advantages to encoding information with dependent types, in particular properties over data.

@LPTK
Copy link

LPTK commented Sep 17, 2024

Similarly, a common claim is that length-indexed vectors need GADTs but this is, again, not actually true!

For the historical context, I think a paper that popularized the idea that you don't actually need GADTs in this way was the 2007 "Finally Tagless, Partially Evaluated" paper https://okmij.org/ftp/tagless-final/JFP.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment