When I took "Fundamentals of Computer Science" in college, my professor was very adamant about the distinction between data and information and about how data doesn't have any inherent meaning. At the time, it seemed a bit silly to me how much emphasis he put on such a seemingly insignificant difference.
In retrospect, I think he was exactly right about this and I wish more programmers took it to heart.
Data is something you can store in a computer, such as, let's say, the byte 0b01000001
.
Information on the other hand is the meaning you give to this data and it is something you cannot store anywhere.
Common interpretations for this byte would be the integer 65 or the ASCII letter A
, but you could absolutely also interpret it as
42 or maybe a blue strawberry.
In the context of programming constructs, information primarily includes the interfaces these constructs provide.
For example, an int
is just represented by a bunch of bytes but the operations that your language provides over values of type int
interpret it specifically as
a (possibly signed) integer modulo some constant (although this doesn't mean that you need to interpret it that way).
Now, the claim I'm going to make is that when programming, data almost never actually matters and what we actually care about are almost always only those interfaces.
A sentence you will often hear if you are present in a certain online niche is "you cannot represent this without dependent types".
Interestingly, this isn't always true! It's just that people confuse the underlying representation (data) with the interface a type provides (information).
As an example, let's use the type of even integers {.., -4, -2, 0, 2, 4, ..}
.
One obvious way to represent this in a language with refinement types would be to take the representation of regular integers and restrict it to something that doesn't allow odd numbers { z : Int | even z }
.
However, there is a much simpler way to represent this type in practically any programming language!
The key is that the data representation of this type doesn't actually matter. We can represent it the same way as our regular integers and all that matters is that the operations we provide interpret it as an even integer.
So if we interpret the value that corresponds to the regular integer n
as the integer 2n
(1 becomes 2, 2 becomes 4, etc.), we will get a type that contains exactly all the even integers!
module EvenInt : sig
type even_int
val add : even_int -> even_int -> even_int
val multiply : even_int -> even_int -> even_int
...
end = struct
type t = int
(* 2n + 2m = 2(n + m) *)
let add n m = n + m
(* 2n * 2m = 2 (2 * n * m) *)
let multiply n m = 2 * n * m
...
end
Similarly, a common claim is that length-indexed vectors need GADTs but this is, again, not actually true!
Because pattern matching rarely plays nicely with data abstraction, you might not be able to achieve the exact interface these usually have, but you can get something equivalent if you replace data constructors and pattern matching with functions.
type z = |
type _ s = |
module Vector : sig
type ('n, 'a) vec
val nil : (z, 'a) vec
val cons : 'a -> ('n, 'a) vec -> ('n s, 'a) vec
val uncons : (s 'n, 'a) vec -> ('a * ('n, 'a) vec)
type ('a, 'r) match_cases = {
nil : () -> 'r;
cons : 'n. 'a -> ('n, 'a) vec -> 'r
}
val match_ : ('n, 'a) vec -> ('a, 'r) match_cases -> 'r
end = struct
type ('n, 'a) vec = 'a list
let nil = []
let cons x xs = (x :: xs)
let uncons = function
| [] -> failwith "unreachable"
| (x :: xs) -> (x, xs)
type ('a, 'r) match_cases = {
nil : () -> 'r;
cons : 'n. 'a -> ('n, 'a) vec -> 'r
}
let match_ vec cases = match vec with
| [] -> vec.nil ()
| x :: xs -> vec.cons x xs
end
So the next time you find yourself claiming that something is unrepresentable, try to separate the data from the information and see if maybe you can represent it after all.
better than cohost ngl