The idea is to improve the SQL language, specifically the join syntax, for the special but common case when joining on foreign key columns.
Example below taken from PostgreSQL documentation 1
In SQL-89, we didn't have any JOIN
syntax yet, so queries were written in this way:
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d, films f
WHERE f.did = d.did
This improved in SQL-92:
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
JOIN films f ON f.did = d.did
However, in the same standard, they also added USING
and NATURAL
:
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
JOIN films f USING (did)
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
NATURAL JOIN films f
Unfortunately, USING
and NATURAL
suffer from problems, as they can suddenly break due to ambiguities if columns are added. 2
A proper database schema with foreign keys can be represented as directed graph, where the tables are the nodes and foreign keys the edges. It's a directed graph since a foreign key has a direction. The foreign key is created on the referencing table, and the table it references is called the referenced table. There is a Many-to-one relationship between the referencing and the referenced table.
Now, please take a look at the join queries above again.
Other than making guesses based on the names of the tables and columns, there is no information in the SQL query code itself that tells us if distributors references films, or if it's the other way around.
We currently rely solely on naming conventions to deduce the relationship between tables, that is, if it's One-to-many or Many-to-one.
If there is a bug in a query written by a colleague, causing unexpected number of rows in the result set, it can sometimes be complicated to debug such a query.
If the common simple joins (when joining on foreign key columns) would be written in a different syntax, the remaining joins would visually stand out and we could focus on making sure we understand them when reading a large SQL query.
Not having to specify all the join columns would also improve conciseness.
Problems summary:
JOIN USING (...)
andNATURAL JOIN
improved conciseness compared toJOIN ... ON
, but suffer from other problems 2.- SELECT queries don't contain any information on the join direction, i.e. what table is foreign vs primary in a join. The reader needs to draw conclusions based on the naming of tables/columns.
- When writing joins, having to specify all columns involved in a join, is unnecessairly verbose.
Foreign keys are constraints with names. These names are auto-generated if not specified explicitly, and must be unique per schema according to the SQL standard, but at least PostgreSQL only require them to be unique per referencing table.
By specifying the foreign key, we wouldn't need to specify the columns on which to perform the join.
Here is how it would work:
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM films f
JOIN FOREIGN f.films_did_fkey d
or e.g.
LEFT JOIN FOREIGN f.films_did_fkey d
Let's explain the JOIN FOREIGN
syntax step-by-step:
-
When
JOIN
is followed by theFOREIGN
keyword, we indicate we want to specify a foreign key to follow on a table alias already introduced in the from clause. -
Next, we specify the foreign key name qualified by the referencing table alias,
f.films_did_fkey
. -
Lastly, we assign a new table alias
d
for the referenced table.
If we would instead want to join by first introducing distributors and then join films, the syntax would be different:
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
JOIN films f FOREIGN films_did_fkey REF d
The syntax is different enough to easily distinguish between the two cases. This is intentional and important, as the first case JOIN FOREIGN
, can never cause extra rows, since it defines a Many-to-one relationship, whereas JOIN ... FOREIGN ... REF
could cause extra rows, since it defines a One-to-many relationship.
Let's explain the JOIN ... FOREIGN ... REF
syntax step-by-step:
-
In the grammar, at the same place as expecting
ON boolean_expression
orUSING (...)
, we allow the keywordFOREIGN
to indicate we want to specify the name of a foreign key for the table we just introduced in the from clause, as part of the current join statement, in the example, this isJOIN films f FOREIGN
. -
Next, we specify the name of the foreign key, as the referencing table which has the foreign key, has already been specified in the first part of the join statement. We simply write
films_did_fkey
. -
Lastly, we need to specify what table alias to join against,
REF d
. This is necessary since even though we know the referenced table based on the foreign key, the table might have been introduced multiple times in the from clause.
To further improve conciseness and readability, a better default naming convention for foreign keys would be to give them the same name as the referenced table:
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM films f
JOIN FOREIGN f.distributors d
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
JOIN films f FOREIGN distributors REF d
Note, this proposal does not depend on any specific naming convention for foreign keys.
If there are multiple foreign keys between two tables, the foreign key names could instead be based on column names. A typical example would be some table with multiple user_id columns, such as owner_id and manager_id, both referencing users.user_id, in such case, the foreign keys could be named owner and manager.
Another example would be a users table with a self-referencing parent_id column. Such foreign key could be named parent.
The example query from the PostgreSQL documentation is perhaps too simple to fully demonstrate and appreciate the clarity and conciseness of the proposed JOIN FOREIGN
syntax. Here is another slightly more complicated example taken from the Grafana project 3.
SELECT *
FROM permission p
LEFT JOIN role r ON p.role_id = r.id
LEFT JOIN team_role tr ON r.id = tr.role_id
LEFT JOIN team t ON tr.team_id = t.id
LEFT JOIN user_role ur ON r.id = ur.role_id
LEFT JOIN "user" u ON ur.user_id = u.id
WHERE p.id = 1
vs
SELECT *
FROM permission p
LEFT JOIN FOREIGN p.role r
LEFT JOIN team_role tr FOREIGN role REF r
LEFT JOIN FOREIGN tr.team t
LEFT JOIN user_role ur FOREIGN role REF r
LEFT JOIN FOREIGN ur.user u
WHERE p.id = 1
What's nice about this example is the need to use both JOIN FOREIGN
and JOIN ... FOREIGN ... REF
. It cannot be written using only JOIN FOREIGN
statements, since we cannot reach all referenced tables from any single referencing table. For instance, we could begin with team_role:
SELECT *
FROM team_role tr
LEFT JOIN FOREIGN tr.role r
LEFT JOIN FOREIGN tr.team t
But then we would need JOIN ... FOREIGN ... REF
, as none of the role nor team tables have any foreign keys to follow:
LEFT JOIN permission p FOREIGN role REF r
LEFT JOIN user_role ur FOREIGN role REF r
LEFT JOIN FOREIGN ur.user u
WHERE p.id = 1
An optional ON
clause can specify additional join conditions.
In the example below, a new column named special of type boolean has been added to the team table.
The two queries are equivalent:
SELECT *
FROM permission p
LEFT JOIN role r ON p.role_id = r.id
LEFT JOIN team_role tr ON r.id = tr.role_id
LEFT JOIN team t ON tr.team_id = t.id AND t.special IS TRUE
LEFT JOIN user_role ur ON r.id = ur.role_id
LEFT JOIN "user" u ON ur.user_id = u.id
WHERE p.id = 1
vs
SELECT *
FROM permission p
LEFT JOIN FOREIGN p.role r
LEFT JOIN team_role tr FOREIGN role REF r
LEFT JOIN FOREIGN tr.team t ON t.special IS TRUE
LEFT JOIN user_role ur FOREIGN role REF r
LEFT JOIN FOREIGN ur.user u
WHERE p.id = 1
As the referenced table is not explicitly specified in a JOIN FOREIGN
statement, readability depends on properly naming the foreign key, as otherwise the SQL query wouldn't reveal the actual name of the referenced table, and it would have to be looked up by reading the foreign key definition.
If the default format for naming foreign keys doesn't contain the referenced table name, this would be a problem, unless manually using the CONSTRAINT constraint_name FOREIGN KEY (...) REFERENCES
syntax when creating the foreign key.
To demonstrate the problem, let's look at the case below, with a query with PostgreSQL's default foreign key names.
Comment by Hacker News user hn_throwaway_99
This is my "Thanks, I hate it" response. Reason being if you use the example they gave:
SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM films f JOIN FOREIGN f.films_did_fkey dYou need to implicitly know the table that films_did_fkey points to, because 'd' is just a table alias. I can't think of anywhere else in the SQL standard where you can introduce a table alias without explicitly referencing the table. In my opinion making code essentially unreadable unless you have other background information is an antipattern.
🤔 Under consideration: Should we sacrifice some conciseness for the sake of extra clarity, by forcing the referenced table name to always be spelled out explicitly? Please see section Alternative syntax for some suggestions.
Use one and the same syntax for both join directions,
where the referencing_alias would always follow FOREIGN
.
JOIN referenced_table FOREIGN referencing_alias.fk_name REF referenced_alias
JOIN referencing_table FOREIGN referencing_alias.fk_name REF referenced_alias
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM films f
JOIN distributors d FOREIGN f.films_did_fkey REF d
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
JOIN films f FOREIGN f.films_did_fkey REF d
Explicitly spell-out the join direction using keywords FROM
and TO
:
JOIN referenced_table FOREIGN fk_name FROM referencing_alias
JOIN referencing_table FOREIGN fk_name TO referenced_alias
This would eliminate the need to fully-qualify the foreign key name.
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM films f
JOIN distributors d FOREIGN films_did_fkey FROM f
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
JOIN films f FOREIGN films_did_fkey TO d
Join using the foreign table column names, matching a foreign key FROM or TO the specified table alias.
This is similar to JOIN USING (column_name, ...)
, with the following differences:
- ✅ No risk for ambiguity thanks to explicitly specifying the [referencing_alias | referenced_alias].
- ✅ No need to use same column names in both tables, i.e. primary key column can just be named
id
if desired. - ✅ The column names always specify columns in the foreign table.
JOIN referenced_table FOREIGN (fk_column_name, ...) FROM referencing_alias
JOIN referencing_table FOREIGN (fk_column_name, ...) TO referenced_alias
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM films f
JOIN distributors d FOREIGN (did) FROM f
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
JOIN films f FOREIGN (did) TO d
Example below based on 2 to demonstrate the problem with JOIN USING
, and how it's not a problem with JOIN FOREIGN
.
CREATE TABLE users (
user_id int NOT NULL PRIMARY KEY
);
CREATE TABLE colors (
color_id int NOT NULL PRIMARY KEY
);
CREATE TABLE cars (
car_id int NOT NULL PRIMARY KEY,
user_id int REFERENCES users,
color_id int REFERENCES colors
);
This query works OK:
SELECT *
FROM colors
JOIN cars USING (color_id)
JOIN users USING (user_id)
Everything works correctly, until someone wants to add a column to colors which specifies who added the certain color:
ALTER TABLE colors
ADD COLUMN user_id integer REFERENCES users;
SELECT *
FROM colors
JOIN cars USING (color_id)
JOIN users USING (user_id)
ERROR: common column name "user_id" appears more than once in left table
and bam, the query is broken and you never saw it coming. What's even worse, all views in the database using this query will continue to work, until you want to restore the database somewhere (e.g. during upgrade).
Note how this would not be a problem with JOIN FOREIGN
:
SELECT *
FROM colors
JOIN cars FOREIGN (color_id) TO colors
JOIN users FOREIGN (user_id) FROM cars
Thanks to specifying FROM cars
it is explicit we want to follow the foreign key from cars to users, and not from colors to users.
Equivalent ways of writing the same query:
SELECT *
FROM cars
JOIN users FOREIGN (user_id) FROM cars
JOIN colors FOREIGN (color_id) FROM cars
SELECT *
FROM users
JOIN cars FOREIGN (user_id) TO users
JOIN colors FOREIGN (color_id) FROM cars
Another possible benefit is the ability to use just id
for primary key columns.
This is not possible with JOIN USING
since column names must match between referencing and referenced tables.
But thanks to JOIN FOREIGN
being explicit about what two tables are joined, the column names always refer to the referencing table, and the columns in the referenced table are looked up based on the matching foreing key, so no need to specify them at all.
Let's look at the same example again, but this time with primary key columns named id
:
CREATE TABLE users (
id int NOT NULL PRIMARY KEY
);
CREATE TABLE colors (
id int NOT NULL PRIMARY KEY
);
CREATE TABLE cars (
id int NOT NULL PRIMARY KEY,
user_id int REFERENCES users,
color_id int REFERENCES colors
);
The queries below are unchanged, since the foreign key column names have not changed, only the primary key column names.
SELECT *
FROM colors
JOIN cars FOREIGN (color_id) TO colors
JOIN users FOREIGN (user_id) FROM cars
SELECT *
FROM cars
JOIN users FOREIGN (user_id) FROM cars
JOIN colors FOREIGN (color_id) FROM cars
SELECT *
FROM users
JOIN cars FOREIGN (user_id) TO users
JOIN colors FOREIGN (color_id) FROM cars
❓ Isn't there still risk for ambiguity in the unthinkable hypothetical scenario where there are multiple foreign keys between two tables on the same foreign key column(s) but to different column(s) in the referenced (i.e. primary) table?
Consider this nonsensical but valid example:
CREATE TABLE foo (
id int NOT NULL PRIMARY KEY,
id2 int UNIQUE,
id3 int UNIQUE
);
CREATE TABLE bar (
id int NOT NULL PRIMARY KEY,
foo_id int,
FOREIGN KEY (foo_id) REFERENCES foo(id2),
FOREIGN KEY (foo_id) REFERENCES foo(id3)
);
INSERT INTO foo (id, id2, id3) VALUES (1, 2, NULL);
INSERT INTO foo (id, id2, id3) VALUES (3, NULL, 2);
INSERT INTO bar (id, foo_id) VALUES (4, 2);
SELECT *
FROM foo
JOIN bar FOREIGN (foo_id) TO foo
ERROR: foreign key cannot be unambiguously resolved
👉 Consideration: While the ambiguity problem with JOIN USING
is easy to run into, the hypotetical case described above is completely unthinkable and arguably not actually a problem in practise.
Example loosely based on 4.
CREATE TABLE departments (
department_id int NOT NULL,
department_location text NOT NULL,
PRIMARY KEY (department_id)
);
CREATE TABLE teams (
department_id int NOT NULL,
team_id int NOT NULL,
team_members int NOT NULL,
PRIMARY KEY (department_id, team_id),
CONSTRAINT department FOREIGN KEY (department_id) REFERENCES departments
);
CREATE TABLE employees (
username text NOT NULL,
department_id int NOT NULL,
team_id int NOT NULL,
PRIMARY KEY (username),
CONSTRAINT department FOREIGN KEY (department_id) REFERENCES departments,
CONSTRAINT team FOREIGN KEY (department_id, team_id) REFERENCES teams
);
SELECT *
FROM employees e
JOIN departments d ON d.department_id = e.department_id
JOIN teams t ON t.department_id = e.department_id
AND t.team_id = e.team_id;
SELECT *
FROM employees e
JOIN FOREIGN e.department d
JOIN FOREIGN e.team t;
Adding additional
ON
conditions to the standardJOIN
syntax is straightforward:How would this syntax handle that situation?