We're Gonna Need a Bigger Table

 ...A thing or idea seems meaningful only when we have several different ways to represent it — different perspectives and different associations. ...Then we can turn it around in our minds, so to speak: however it seems at the moment, we can see it another way and we never come to a full stop. In other words, we can 'think' about it. If there were only one way to represent this thing or idea, we would not call this representation thinking. - Marvin Minsky

Today I'd like to show you a way to model functional programming concepts as operations on a single infinite table, reducing complex n-dimensional concepts into a more familiar two-dimensional space.

Remember that this is just a modeling exercise - I'm not necessarily making ontological claims about programming, I'm simply trying to explore a new vocabulary and model for reasoning about computation.

Defining Terms

A tuple is a collection of values organized into a data structure. For instance, ["myk", 34, 0x0000FF] is a tuple - it says "these three values are somehow related". The exact nature of that relationship is unknown without more context, but the important thing is that we can model any set of values in a fixed order as a tuple.

A relation is perhaps more commonly known as a table - it's a set of tuples where each tuple has the same structure, and that structure reflects some specific semantics. For instance, our example tuple above may be a row from a table that looks like this:

Name	Age	Favorite Color
"myk"	34	0x0000FF
"han"	29	0x00FF00

A function is defined as a relation between some set of inputs (defined as some subset of the columns in our table) and some set of outputs (defined as some other subset of the columns in our table). In the simplest case, we can have an identity function - its output is its input.

Input	Output
1	1
"abc"	"abc"

Each row in our "identity function" table is a single call to the function with a specific input argument. It returns a specific output argument. To flesh this out a bit, a slightly more complex function may look something like this -

input1	input2	output1
1	1	2
7	3	10

It looks like we may be looking at a 'Sum' function here, where input1 is the augend, input2 is the addend and output is the sum. We'll get into more details about this later, but I want to make sure that it makes sense that you can represent a function as a table, aka as a relation of tuples.

A data type can be understood as the set of all values that satisfy some constraint. It's really a predicate function - given some input X, is X in the set of acceptable values? Some common types are things like Integer or String or Boolean, and applications will often introduce new types like User or Session.

In our functions-as-tables metaphor, data types are used to constrain the size of our tables by restricting the number of valid values for each column. For instance, if we specify that the three columns in our sum table above all have the Integer data type then we won't have any rows where we're adding null to "Hello world!" - those are simply invalid rows, and logically cannot exist in our (still infinite) table.

Abstraction, in this model, has a very specific definition: it's when you add a new column to your table. For instance, we can create a more abstract version of the sum function by adding an operator column. The resulting math function looks like this:

input1	input2	operator	output
1	1	+	2
1	1	-	0
1	2	+	3

Concretion is what we'll call the dual of Abstraction, in this model - rather than adding a column, we're removing a column. We do that by "partially applying" a value to the function - restricting the size of the function, much as we did with data types, such that one of our columns has a data type that has only a single value in it. So for instance, if we wanted to create a multiply function we could partially apply the * operator into the Math function to create a new function:

input1	input2	output
1	1	1
7	3	21

By now you should be thinking "Huh... that sure seems like sum, multiply and math are all just different views into the same abstract space..." and you'd be 100% correct! There are interesting consequences of this that we'll get to shortly!

Finally, a higher order function is a function where one or more of the columns has the data type function. In other words, every value for one of our columns becomes a whole table! As an admittedly forced example, consider this greetUser() function which takes as inputs a user's first name, a user's last name, and a function used to get the appropriate greeting for the current time of day.

First Name

Last Name

getGreeting()

Output

Myk

Bilokonsky

Current Time	Greeting
5-12	"Good Morning"
12-6	"Good Afternoon"
6-9	"Good Evening"
9-5	"Good Night"

??!!

At first glance it seems like there's a problem here - if each row of one of our tables is an invocation of the represented function, what do we do if our row contains a nested table? How do we represent that idea? It seems like our output column can't have a value until we step into this nested space?

But, thanks to the magic of math - I think, can anyone give me some links to formal references to this kind of thing? - we can simply change how we represent the problem. A critical rule here is that the columns of a nested table can be pulled into the parent table. The function above can be rewritten as follows:

First Name	Last Name	Current Time	Greeting	Output
Myk	Bilokonsky	5-12	"Good Morning"	"Good Morning Myk Bilokonsky"
Myk	Bilokonsky	12-6	"Good Afternoon"	"Good Afternoon Myk Bilokonsky"
Myk	Bilokonsky	6-9	"Good Evening"	"Good Evening Myk Bilokonsky"
Myk	Bilokonsky	9-5	"Good Night"	"Good Night Myk Bilokonsky"

etc and so on. Now our specific invocation of the function maps cleanly to a row in our relation.

Inputs vs Arguments, Outputs vs Return Value - when we model functions in our code we generally think about our function's arguments as its inputs, and its return value as its output. But as anyone who has ever tried to shoe-horn formal logic into an impure (read: existing) codebase, our functions often live within the scope of some existing state and they often mutate that state as a side-effect when invoked.

This is fine - but it means that we need to think of things like "current time" as an input, even when it's not an argument. It's simply a property of the environment at execution time. Similarly, a function can, say, write some values to a database without actually returning anything. That change of database state must be included when we discuss the "output" of the function.

So What?

So far we've seen how functions can be modeled as relations, where each column can be constrained by a data type and where we can ultimately treat every function call as a lookup in an infinite table. add(1, 7) can be expressed as something like SELECT output FROM sum WHERE input1=1 AND input2=7, right?

(Interestingly, it means we can also do something like SELECT input1 FROM sum WHERE input2=7 AND output=8 - we get, at least conceptually, the full implications of using immutable relations.)

To summarize:

Functions can be modeled as relations.
Data Types can be used to restrict valid rows in our relations.
Abstraction and Concretion can be treated as formal processes whereby columns are added or removed from our relations.
Higher Order Functions work when you realize that they are simply a mechanism to express abstraction, adding columns to our table.
Inputs are not limited to arguments, outputs are not limited to return values. We can easily model side-effects here.

Subsequent essays on this topic will explore complex, abstract ideas from functional programming in this simplified but expressive space. We'll look at what functors and monads and applicatives are, we'll explore currying and some of the ramifications of abstraction, and we'll look at how once you go down this rabbit hole of modeling functions as relations you quickly run into the much larger rabbit hole of modeling applications as relations - and then, just maybe, we'll explore what it means to think of the universe itself as a single enormous function.

mbilokonsky/tables.md Secret

We're Gonna Need a Bigger Table

Defining Terms

So What?

mbilokonsky commented Nov 13, 2018

Uh oh!