Skip to content

Instantly share code, notes, and snippets.

@Jacke
Forked from joelonsql/JOIN FOREIGN.md
Created January 2, 2022 16:43
Show Gist options
  • Save Jacke/19b87728f83a6fe3ff98323b47ddf7b1 to your computer and use it in GitHub Desktop.
Save Jacke/19b87728f83a6fe3ff98323b47ddf7b1 to your computer and use it in GitHub Desktop.
SQL language proposal: JOIN FOREIGN

SQL language proposal: JOIN FOREIGN

The idea is to improve the SQL language, specifically the join syntax, for the special but common case when joining on foreign key columns.

The problem

Example below taken from PostgreSQL documentation 1

In SQL-89, we didn't have any JOIN syntax yet, so queries were written in this way:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d, films f
    WHERE f.did = d.did

This improved in SQL-92:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f ON f.did = d.did

However, in the same standard, they also added USING and NATURAL:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f USING (did)
SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    NATURAL JOIN films f

Unfortunately, USING and NATURAL suffer from problems, as they can suddenly break due to ambiguities if columns are added. 2

A proper database schema with foreign keys can be represented as directed graph, where the tables are the nodes and foreign keys the edges. It's a directed graph since a foreign key has a direction. The foreign key is created on the referencing table, and the table it references is called the referenced table. There is a Many-to-one relationship between the referencing and the referenced table.

Now, please take a look at the join queries above again.

Other than making guesses based on the names of the tables and columns, there is no information in the SQL query code itself that tells us if distributors references films, or if it's the other way around.

We currently rely solely on naming conventions to deduce the relationship between tables, that is, if it's One-to-many or Many-to-one.

If there is a bug in a query written by a colleague, causing unexpected number of rows in the result set, it can sometimes be complicated to debug such a query.

If the common simple joins (when joining on foreign key columns) would be written in a different syntax, the remaining joins would visually stand out and we could focus on making sure we understand them when reading a large SQL query.

Not having to specify all the join columns would also improve conciseness.

Problems summary:

  • JOIN USING (...) and NATURAL JOIN improved conciseness compared to JOIN ... ON, but suffer from other problems 2.
  • SELECT queries doesn't contain any information on the join direction, i.e. what table is foreign vs primary in a join. The reader needs to draw conclusions based on the naming of tables/columns.
  • When writing joins, having to specify all columns involved in a join, is unnecessairly verbose.

The proposal

Foreign keys are constraints with names. These names are auto-generated if not specified explicitly, and must be unique per schema according to the SQL standard, but at least PostgreSQL only require them to be unique per referencing table.

By specifying the foreign key, we wouldn't need to specify the columns on which to perform the join.

Here is how it would work:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM films f
    JOIN FOREIGN f.films_did_fkey d

or e.g.

    LEFT JOIN FOREIGN f.films_did_fkey d

Let's explain the JOIN FOREIGN syntax step-by-step:

  1. When JOIN is followed by the FOREIGN keyword, we indicate we want to specify a foreign key to follow on a table alias already introduced in the from clause.

  2. Next, we specify the foreign key name qualified by the referencing table alias, f.films_did_fkey.

  3. Lastly, we assign a new table alias d for the referenced table.

If we would instead want to join by first introducing distributors and then join films, the syntax would be different:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f FOREIGN films_did_fkey REF d

The syntax is different enough to easily distinguish between the two cases. This is intentional and important, as the first case JOIN FOREIGN, can never cause extra rows, since it defines a Many-to-one relationship, whereas JOIN ... FOREIGN ... REF could cause extra rows, since it defines a One-to-many relationship.

Let's explain the JOIN ... FOREIGN ... REF syntax step-by-step:

  1. In the grammar, at the same place as expecting ON boolean_expression or USING (...), we allow the keyword FOREIGN to indicate we want to specify the name of a foreign key for the table we just introduced in the from clause, as part of the current join statement, in the example, this is JOIN films f FOREIGN.

  2. Next, we specify the name of the foreign key, as the referencing table which has the foreign key, has already been specified in the first part of the join statement. We simply write films_did_fkey.

  3. Lastly, we need to specify what table alias to join against, REF d. This is necessary since even though we know the referenced table based on the foreign key, the table might have been introduced multiple times in the from clause.

To further improve conciseness and readability, a better default naming convention for foreign keys would be to give them the same name as the referenced table:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM films f
    JOIN FOREIGN f.distributors d
SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f FOREIGN distributors REF d

Note, this proposal does not depend on any specific naming convention for foreign keys.

If there are multiple foreign keys between two tables, the foreign key names could instead be based on column names. A typical example would be some table with multiple user_id columns, such as owner_id and manager_id, both referencing users.user_id, in such case, the foreign keys could be named owner and manager.

Another example would be a users table with a self-referencing parent_id column. Such foreign key could be named parent.

The example query from the PostgreSQL documentation is perhaps too simple to fully demonstrate and appreciate the clarity and conciseness of the proposed JOIN FOREIGN syntax. Here is another slightly more complicated example taken from the Grafana project 3.

SELECT *
    FROM permission p
    LEFT JOIN role r ON p.role_id = r.id
    LEFT JOIN team_role tr ON r.id = tr.role_id
    LEFT JOIN team t ON tr.team_id = t.id
    LEFT JOIN user_role ur ON r.id = ur.role_id
    LEFT JOIN "user" u ON ur.user_id = u.id
    WHERE p.id = 1;

vs

SELECT *
    FROM permission p
    LEFT JOIN FOREIGN p.role r
    LEFT JOIN team_role tr FOREIGN role REF r
    LEFT JOIN FOREIGN tr.team t
    LEFT JOIN user_role ur FOREIGN role REF r
    LEFT JOIN FOREIGN ur.user u
    WHERE p.id = 1;

What's nice about this example is the need to use both JOIN FOREIGN and JOIN ... FOREIGN ... REF. It cannot be written using only JOIN FOREIGN statements, since we cannot reach all referenced tables from any single referencing table. For instance, we could begin with team_role:

SELECT *
    FROM team_role tr
    LEFT JOIN FOREIGN tr.role r
    LEFT JOIN FOREIGN tr.team t

But then we would need JOIN ... FOREIGN ... REF, as none of the role nor team tables have any foreign keys to follow:

    LEFT JOIN permission p FOREIGN role REF r
    LEFT JOIN user_role ur FOREIGN role REF r
    LEFT JOIN FOREIGN ur.user u
    WHERE p.id = 1;

Additional join conditions

An optional ON clause can specify additional join conditions.

In the example below, a new column named special of type boolean has been added to the team table.

The two queries are equivalent:

SELECT *
    FROM permission p
    LEFT JOIN role r ON p.role_id = r.id
    LEFT JOIN team_role tr ON r.id = tr.role_id
    LEFT JOIN team t ON tr.team_id = t.id AND t.special IS TRUE
    LEFT JOIN user_role ur ON r.id = ur.role_id
    LEFT JOIN "user" u ON ur.user_id = u.id
    WHERE p.id = 1;

vs

SELECT *
    FROM permission p
    LEFT JOIN FOREIGN p.role r
    LEFT JOIN team_role tr FOREIGN role REF r
    LEFT JOIN FOREIGN tr.team t ON t.special IS TRUE
    LEFT JOIN user_role ur FOREIGN role REF r
    LEFT JOIN FOREIGN ur.user u
    WHERE p.id = 1;

Drawbacks / Tradeoffs / Remaining issues

Tradeoffs: clarity vs conciseness, implicit vs explicit

As the referenced table is not explicitly specified in a JOIN FOREIGN statement, readability depends on properly naming the foreign key, as otherwise the SQL query wouldn't reveal the actual name of the referenced table, and it would have to be looked up by reading the foreign key definition.

If the default format for naming foreign keys doesn't contain the referenced table name, this would be a problem, unless manually using the CONSTRAINT constraint_name FOREIGN KEY (...) REFERENCES syntax when creating the foreign key.

To demonstrate the problem, let's look at the case below, with a query with PostgreSQL's default foreign key names.

Comment by Hacker News user hn_throwaway_99

This is my "Thanks, I hate it" response. Reason being if you use the example they gave:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
   FROM films f
   JOIN FOREIGN f.films_did_fkey d

You need to implicitly know the table that films_did_fkey points to, because 'd' is just a table alias. I can't think of anywhere else in the SQL standard where you can introduce a table alias without explicitly referencing the table. In my opinion making code essentially unreadable unless you have other background information is an antipattern.

⚠️ Note how the actual referenced table distributors is not spelled out anywhere in the query.

🤔 Under consideration: Should we sacrifice some conciseness for the sake of extra clarity, by forcing the referenced table name to always be spelled out explicitly? Please see section Alternative syntax for some suggestions.

Alternative syntax

JOIN ... FOREIGN referencing_alias.fk_name REF referenced_alias

Use one and the same syntax for both join directions, where the referencing_alias would always follow FOREIGN.

JOIN referenced_table FOREIGN referencing_alias.fk_name REF referenced_alias
JOIN referencing_table FOREIGN referencing_alias.fk_name REF referenced_alias
SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM films f
    JOIN distributors d FOREIGN f.films_did_fkey REF d

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f FOREIGN f.films_did_fkey REF d

JOIN ... FOREIGN fk_name [ FROM referencing_alias | TO referenced_alias ]

Explicitly spell-out the join direction using keywords FROM and TO:

JOIN referenced_table FOREIGN fk_name FROM referencing_alias
JOIN referencing_table FOREIGN fk_name TO referenced_alias

This would eliminate the need to fully-qualify the foreign key name.

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM films f
    JOIN distributors d FOREIGN films_did_fkey FROM f

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f FOREIGN films_did_fkey TO d

More examples

Join on multiple columns

Example loosely based on 4.

CREATE TABLE departments (
    department_id int NOT NULL,
    department_location text NOT NULL,
    PRIMARY KEY (department_id)
);

CREATE TABLE teams (
    department_id int NOT NULL,
    team_id int NOT NULL,
    team_members int NOT NULL,
    PRIMARY KEY (department_id, team_id),
    CONSTRAINT department FOREIGN KEY (department_id) REFERENCES departments
);

CREATE TABLE employees (
    username text NOT NULL,
    department_id int NOT NULL,
    team_id int NOT NULL,
    PRIMARY KEY (username),
    CONSTRAINT department FOREIGN KEY (department_id) REFERENCES departments,
    CONSTRAINT team FOREIGN KEY (department_id, team_id) REFERENCES teams
);

SELECT *
    FROM employees e
    JOIN departments d ON d.department_id = e.department_id
    JOIN teams t ON t.department_id = e.department_id
                AND t.team_id       = e.team_id;

SELECT *
    FROM employees e
    JOIN FOREIGN e.department d
    JOIN FOREIGN e.team t;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment