Skip to content

Instantly share code, notes, and snippets.

@joelonsql
Last active February 8, 2023 23:29
Show Gist options
  • Save joelonsql/15b50b65ec343dce94db6249cfea8aaa to your computer and use it in GitHub Desktop.
Save joelonsql/15b50b65ec343dce94db6249cfea8aaa to your computer and use it in GitHub Desktop.
SQL language proposal: JOIN FOREIGN

SQL language proposal: JOIN FOREIGN

The idea is to improve the SQL language, specifically the join syntax, for the special but common case when joining on foreign key columns.

The problem

Example below taken from PostgreSQL documentation 1

In SQL-89, we didn't have any JOIN syntax yet, so queries were written in this way:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d, films f
    WHERE f.did = d.did

This improved in SQL-92:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f ON f.did = d.did

However, in the same standard, they also added USING and NATURAL:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f USING (did)
SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    NATURAL JOIN films f

Unfortunately, USING and NATURAL suffer from problems, as they can suddenly break due to ambiguities if columns are added. 2

A proper database schema with foreign keys can be represented as directed graph, where the tables are the nodes and foreign keys the edges. It's a directed graph since a foreign key has a direction. The foreign key is created on the referencing table, and the table it references is called the referenced table. There is a Many-to-one relationship between the referencing and the referenced table.

Now, please take a look at the join queries above again.

Other than making guesses based on the names of the tables and columns, there is no information in the SQL query code itself that tells us if distributors references films, or if it's the other way around.

We currently rely solely on naming conventions to deduce the relationship between tables, that is, if it's One-to-many or Many-to-one.

If there is a bug in a query written by a colleague, causing unexpected number of rows in the result set, it can sometimes be complicated to debug such a query.

If the common simple joins (when joining on foreign key columns) would be written in a different syntax, the remaining joins would visually stand out and we could focus on making sure we understand them when reading a large SQL query.

Not having to specify all the join columns would also improve conciseness.

Problems summary:

  • JOIN USING (...) and NATURAL JOIN improved conciseness compared to JOIN ... ON, but suffer from other problems 2.
  • SELECT queries don't contain any information on the join direction, i.e. what table is foreign vs primary in a join. The reader needs to draw conclusions based on the naming of tables/columns.
  • When writing joins, having to specify all columns involved in a join, is unnecessairly verbose.

The proposal

Foreign keys are constraints with names. These names are auto-generated if not specified explicitly, and must be unique per schema according to the SQL standard, but at least PostgreSQL only require them to be unique per referencing table.

By specifying the foreign key, we wouldn't need to specify the columns on which to perform the join.

Here is how it would work:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM films f
    JOIN FOREIGN f.films_did_fkey d

or e.g.

    LEFT JOIN FOREIGN f.films_did_fkey d

Let's explain the JOIN FOREIGN syntax step-by-step:

  1. When JOIN is followed by the FOREIGN keyword, we indicate we want to specify a foreign key to follow on a table alias already introduced in the from clause.

  2. Next, we specify the foreign key name qualified by the referencing table alias, f.films_did_fkey.

  3. Lastly, we assign a new table alias d for the referenced table.

If we would instead want to join by first introducing distributors and then join films, the syntax would be different:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f FOREIGN films_did_fkey REF d

The syntax is different enough to easily distinguish between the two cases. This is intentional and important, as the first case JOIN FOREIGN, can never cause extra rows, since it defines a Many-to-one relationship, whereas JOIN ... FOREIGN ... REF could cause extra rows, since it defines a One-to-many relationship.

Let's explain the JOIN ... FOREIGN ... REF syntax step-by-step:

  1. In the grammar, at the same place as expecting ON boolean_expression or USING (...), we allow the keyword FOREIGN to indicate we want to specify the name of a foreign key for the table we just introduced in the from clause, as part of the current join statement, in the example, this is JOIN films f FOREIGN.

  2. Next, we specify the name of the foreign key, as the referencing table which has the foreign key, has already been specified in the first part of the join statement. We simply write films_did_fkey.

  3. Lastly, we need to specify what table alias to join against, REF d. This is necessary since even though we know the referenced table based on the foreign key, the table might have been introduced multiple times in the from clause.

To further improve conciseness and readability, a better default naming convention for foreign keys would be to give them the same name as the referenced table:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM films f
    JOIN FOREIGN f.distributors d
SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f FOREIGN distributors REF d

Note, this proposal does not depend on any specific naming convention for foreign keys.

If there are multiple foreign keys between two tables, the foreign key names could instead be based on column names. A typical example would be some table with multiple user_id columns, such as owner_id and manager_id, both referencing users.user_id, in such case, the foreign keys could be named owner and manager.

Another example would be a users table with a self-referencing parent_id column. Such foreign key could be named parent.

The example query from the PostgreSQL documentation is perhaps too simple to fully demonstrate and appreciate the clarity and conciseness of the proposed JOIN FOREIGN syntax. Here is another slightly more complicated example taken from the Grafana project 3.

SELECT *
    FROM permission p
    LEFT JOIN role r ON p.role_id = r.id
    LEFT JOIN team_role tr ON r.id = tr.role_id
    LEFT JOIN team t ON tr.team_id = t.id
    LEFT JOIN user_role ur ON r.id = ur.role_id
    LEFT JOIN "user" u ON ur.user_id = u.id
    WHERE p.id = 1

vs

SELECT *
    FROM permission p
    LEFT JOIN FOREIGN p.role r
    LEFT JOIN team_role tr FOREIGN role REF r
    LEFT JOIN FOREIGN tr.team t
    LEFT JOIN user_role ur FOREIGN role REF r
    LEFT JOIN FOREIGN ur.user u
    WHERE p.id = 1

What's nice about this example is the need to use both JOIN FOREIGN and JOIN ... FOREIGN ... REF. It cannot be written using only JOIN FOREIGN statements, since we cannot reach all referenced tables from any single referencing table. For instance, we could begin with team_role:

SELECT *
    FROM team_role tr
    LEFT JOIN FOREIGN tr.role r
    LEFT JOIN FOREIGN tr.team t

But then we would need JOIN ... FOREIGN ... REF, as none of the role nor team tables have any foreign keys to follow:

    LEFT JOIN permission p FOREIGN role REF r
    LEFT JOIN user_role ur FOREIGN role REF r
    LEFT JOIN FOREIGN ur.user u
    WHERE p.id = 1

Additional join conditions

An optional ON clause can specify additional join conditions.

In the example below, a new column named special of type boolean has been added to the team table.

The two queries are equivalent:

SELECT *
    FROM permission p
    LEFT JOIN role r ON p.role_id = r.id
    LEFT JOIN team_role tr ON r.id = tr.role_id
    LEFT JOIN team t ON tr.team_id = t.id AND t.special IS TRUE
    LEFT JOIN user_role ur ON r.id = ur.role_id
    LEFT JOIN "user" u ON ur.user_id = u.id
    WHERE p.id = 1

vs

SELECT *
    FROM permission p
    LEFT JOIN FOREIGN p.role r
    LEFT JOIN team_role tr FOREIGN role REF r
    LEFT JOIN FOREIGN tr.team t ON t.special IS TRUE
    LEFT JOIN user_role ur FOREIGN role REF r
    LEFT JOIN FOREIGN ur.user u
    WHERE p.id = 1

Drawbacks / Tradeoffs / Remaining issues

Tradeoffs: clarity vs conciseness, implicit vs explicit

As the referenced table is not explicitly specified in a JOIN FOREIGN statement, readability depends on properly naming the foreign key, as otherwise the SQL query wouldn't reveal the actual name of the referenced table, and it would have to be looked up by reading the foreign key definition.

If the default format for naming foreign keys doesn't contain the referenced table name, this would be a problem, unless manually using the CONSTRAINT constraint_name FOREIGN KEY (...) REFERENCES syntax when creating the foreign key.

To demonstrate the problem, let's look at the case below, with a query with PostgreSQL's default foreign key names.

Comment by Hacker News user hn_throwaway_99

This is my "Thanks, I hate it" response. Reason being if you use the example they gave:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
   FROM films f
   JOIN FOREIGN f.films_did_fkey d

You need to implicitly know the table that films_did_fkey points to, because 'd' is just a table alias. I can't think of anywhere else in the SQL standard where you can introduce a table alias without explicitly referencing the table. In my opinion making code essentially unreadable unless you have other background information is an antipattern.

⚠️ Note how the actual referenced table distributors is not spelled out anywhere in the query.

🤔 Under consideration: Should we sacrifice some conciseness for the sake of extra clarity, by forcing the referenced table name to always be spelled out explicitly? Please see section Alternative syntax for some suggestions.

Alternative syntax

JOIN ... FOREIGN referencing_alias.fk_name REF referenced_alias

Use one and the same syntax for both join directions, where the referencing_alias would always follow FOREIGN.

JOIN referenced_table FOREIGN referencing_alias.fk_name REF referenced_alias
JOIN referencing_table FOREIGN referencing_alias.fk_name REF referenced_alias
SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM films f
    JOIN distributors d FOREIGN f.films_did_fkey REF d

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f FOREIGN f.films_did_fkey REF d

JOIN ... FOREIGN fk_name [FROM referencing_alias | TO referenced_alias]

Explicitly spell-out the join direction using keywords FROM and TO:

JOIN referenced_table FOREIGN fk_name FROM referencing_alias
JOIN referencing_table FOREIGN fk_name TO referenced_alias

This would eliminate the need to fully-qualify the foreign key name.

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM films f
    JOIN distributors d FOREIGN films_did_fkey FROM f

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f FOREIGN films_did_fkey TO d

JOIN table_name FOREIGN (fk_column_name, ...) [FROM referencing_alias | TO referenced_alias]

Join using the foreign table column names, matching a foreign key FROM or TO the specified table alias.

This is similar to JOIN USING (column_name, ...), with the following differences:

  1. ✅ No risk for ambiguity thanks to explicitly specifying the [referencing_alias | referenced_alias].
  2. ✅ No need to use same column names in both tables, i.e. primary key column can just be named id if desired.
  3. ✅ The column names always specify columns in the foreign table.
    JOIN referenced_table FOREIGN (fk_column_name, ...) FROM referencing_alias
    JOIN referencing_table FOREIGN (fk_column_name, ...) TO referenced_alias
SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM films f
    JOIN distributors d FOREIGN (did) FROM f

SELECT f.title, f.did, d.name, f.date_prod, f.kind
    FROM distributors d
    JOIN films f FOREIGN (did) TO d

Example below based on 2 to demonstrate the problem with JOIN USING, and how it's not a problem with JOIN FOREIGN.

CREATE TABLE users (
    user_id int NOT NULL PRIMARY KEY
);

CREATE TABLE colors (
    color_id int NOT NULL PRIMARY KEY
);

CREATE TABLE cars (
    car_id int NOT NULL PRIMARY KEY,
    user_id int REFERENCES users,
    color_id int REFERENCES colors
);

This query works OK:

SELECT *
    FROM colors
    JOIN cars USING (color_id)
    JOIN users USING (user_id)

Everything works correctly, until someone wants to add a column to colors which specifies who added the certain color:

ALTER TABLE colors
    ADD COLUMN user_id integer REFERENCES users;

SELECT *
    FROM colors
    JOIN cars USING (color_id)
    JOIN users USING (user_id)

ERROR:  common column name "user_id" appears more than once in left table

and bam, the query is broken and you never saw it coming. What's even worse, all views in the database using this query will continue to work, until you want to restore the database somewhere (e.g. during upgrade).

Note how this would not be a problem with JOIN FOREIGN:

SELECT *
    FROM colors
    JOIN cars FOREIGN (color_id) TO colors
    JOIN users FOREIGN (user_id) FROM cars

Thanks to specifying FROM cars it is explicit we want to follow the foreign key from cars to users, and not from colors to users.

Equivalent ways of writing the same query:

SELECT *
    FROM cars
    JOIN users FOREIGN (user_id) FROM cars
    JOIN colors FOREIGN (color_id) FROM cars

SELECT *
    FROM users
    JOIN cars FOREIGN (user_id) TO users
    JOIN colors FOREIGN (color_id) FROM cars

Another possible benefit is the ability to use just id for primary key columns. This is not possible with JOIN USING since column names must match between referencing and referenced tables. But thanks to JOIN FOREIGN being explicit about what two tables are joined, the column names always refer to the referencing table, and the columns in the referenced table are looked up based on the matching foreing key, so no need to specify them at all.

Let's look at the same example again, but this time with primary key columns named id:

CREATE TABLE users (
    id int NOT NULL PRIMARY KEY
);

CREATE TABLE colors (
    id int NOT NULL PRIMARY KEY
);

CREATE TABLE cars (
    id int NOT NULL PRIMARY KEY,
    user_id int REFERENCES users,
    color_id int REFERENCES colors
);

The queries below are unchanged, since the foreign key column names have not changed, only the primary key column names.

SELECT *
    FROM colors
    JOIN cars FOREIGN (color_id) TO colors
    JOIN users FOREIGN (user_id) FROM cars

SELECT *
    FROM cars
    JOIN users FOREIGN (user_id) FROM cars
    JOIN colors FOREIGN (color_id) FROM cars

SELECT *
    FROM users
    JOIN cars FOREIGN (user_id) TO users
    JOIN colors FOREIGN (color_id) FROM cars

❓ Isn't there still risk for ambiguity in the unthinkable hypothetical scenario where there are multiple foreign keys between two tables on the same foreign key column(s) but to different column(s) in the referenced (i.e. primary) table?

Consider this nonsensical but valid example:

CREATE TABLE foo (
    id int NOT NULL PRIMARY KEY,
    id2 int UNIQUE,
    id3 int UNIQUE
);
CREATE TABLE bar (
    id int NOT NULL PRIMARY KEY,
    foo_id int,
    FOREIGN KEY (foo_id) REFERENCES foo(id2),
    FOREIGN KEY (foo_id) REFERENCES foo(id3)
);

INSERT INTO foo (id, id2, id3) VALUES (1, 2, NULL);
INSERT INTO foo (id, id2, id3) VALUES (3, NULL, 2);
INSERT INTO bar (id, foo_id) VALUES (4, 2);

⚠️ Yes, the following query would be ambiguous and must generate an error:

SELECT *
    FROM foo
    JOIN bar FOREIGN (foo_id) TO foo

ERROR:  foreign key cannot be unambiguously resolved

👉 Consideration: While the ambiguity problem with JOIN USING is easy to run into, the hypotetical case described above is completely unthinkable and arguably not actually a problem in practise.

More examples

Join on multiple columns

Example loosely based on 4.

CREATE TABLE departments (
    department_id int NOT NULL,
    department_location text NOT NULL,
    PRIMARY KEY (department_id)
);

CREATE TABLE teams (
    department_id int NOT NULL,
    team_id int NOT NULL,
    team_members int NOT NULL,
    PRIMARY KEY (department_id, team_id),
    CONSTRAINT department FOREIGN KEY (department_id) REFERENCES departments
);

CREATE TABLE employees (
    username text NOT NULL,
    department_id int NOT NULL,
    team_id int NOT NULL,
    PRIMARY KEY (username),
    CONSTRAINT department FOREIGN KEY (department_id) REFERENCES departments,
    CONSTRAINT team FOREIGN KEY (department_id, team_id) REFERENCES teams
);

SELECT *
    FROM employees e
    JOIN departments d ON d.department_id = e.department_id
    JOIN teams t ON t.department_id = e.department_id
                AND t.team_id       = e.team_id;

SELECT *
    FROM employees e
    JOIN FOREIGN e.department d
    JOIN FOREIGN e.team t;
@branjwong
Copy link

Big fan

@tucnak
Copy link

tucnak commented Dec 30, 2021

To further improve conciseness and readability, a better default naming convention for foreign keys would be to give them the same name as the referenced table

This is a must. The alternative is verbose and will unfortunately defy expectations in the cases where they don't follow particular foreign key naming conventions.

@schutzie
Copy link

Foreign Films... ha.

Makes sense, but I don't know the names of any of the FKs on any tables, existing functionality is still fine with me.

@pjb1008
Copy link

pjb1008 commented Dec 30, 2021

I like this too. As a side-effect, it'll encourage people to choose meaningful names for constraints instead of letting the database generate names automatically. I see many ERDs where the R has no written description, so the meaning of the relationship drifts with time in ways that are unhelpful. Shifting the focus of query writing to be on following relationships rather than joining tables columns should help with that.

I would like REF to be optional however. In the common case where each table is referenced only once, the join is unambiguous without REF being specified.

My personal preference would be to not split the syntax so much according to the direction of the constraint, and instead to have a syntax like:

table_a LEFT JOIN table_b USING table_a.constraint_name -- many to one, maybe
table_a RIGHT JOIN table_b USING table_a.constraint_name -- one to many
table_b LEFT JOIN table_a USING [table_b.]constraint_name [ REFERENCES table_b ] -- one to many
table_b RIGHT JOIN table_a USING [table_b.]constraint_name [ REFERENCES table_b ] -- many to one, maybe

I've written 'maybe' on the one-to-many relationships. You might have a uniqueness constraint on the foreign key in table_a, turning it into a one-to-one relationship. To know if you'll get multiple rows from these queries, you need the schema.

Another feature you need is the ability to join, following a foreign constraint as above, but imposing additional constraints on the join.
You see this problem already with USING. When you want to consider all the rows, you can use USING. As soon as you want to do anything more fancy, you have to use ON. That can stop you using USING later in the query, and is much harder to visually check for correctness. I'm thinking of joins like this:

employee LEFT JOIN rota USING (employee_id)
employee LEFT JOIN rota ON employee.employee_id=rota.employee_id AND rota.session > 5

For USING, this is annoying, but at least you're using the same column names. Once you switch to using foreign key constraint names for your joins, I'd say that's a showstopper problem; the queries that have to be written out in full are longer and more complex, so they're the ones that would benefit the most from a syntax based on relationships. The proposed syntax needs to be able to combine both the relationship and any additional restrictions for the join.

@gisborne
Copy link

Still too verbose.

If there’s a declared foreign key relationship between the tables, use it automatically, so just JOIN FOREIGN . The key name should only be required if there is no declared foreign key or if there is more than one.

@na-ka-na
Copy link

na-ka-na commented Dec 30, 2021

Trust me I want to like this proposal. There's a lot to like. Only two things bug me:

  1. foreign key constraint names are usually bad and too verbose. They often contain both table names and then the column names as well. It's easier to remember table names and column names than to remember foreign key constraint names
  2. JOIN FOREIGN f.distributors d ..... this syntax I'm not a fan of since it doesn't immediately tell me which table I'm joining with. Here of course you've named the foreign key same as the other table name, but that's usually not the case. How about... JOIN distributors d USING FOREIGN f.distributors .... slightly more verbose, but makes me feel much better. This way all my JOINs line up and look uniform, whether I'm using a foreign key or not

@dolmen
Copy link

dolmen commented Dec 30, 2021

  • INNER JOIN
  • OUTER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  • NATURAL JOIN

But this proposal is about:

  • JOIN FOREIGN

What justifies this naming inconsistency?

(Note: this is not an endorsement of the rest of the proposal)

@dvnrvn
Copy link

dvnrvn commented Dec 30, 2021

I don't like it. There is a cognitive cost to having to remember the name of foreign keys, and it doesn't feel worth it when the tradeoff is just making code concise by a trivial amount. Also, this is another feature that wouldn't work in many data warehouse implementations.

@stelf
Copy link

stelf commented Dec 30, 2021

interestingly no example with composite key. also note by @dolmen holds. @pjb1008 also makes important point.

my 10c would be that more often than not the names of the keys actually differ and LEFT OUTER more often than not has a second non-key-related condition.

@acutesoftware
Copy link

I like it

@shawn-simon
Copy link

How would you know what table you're joining to from just reading the query? You'd have to know the names of every foreign key in your DB, which overall decreases readability for someone new to the DB schema

@al2o3cr
Copy link

al2o3cr commented Dec 31, 2021

Adding additional ON conditions to the standard JOIN syntax is straightforward:

SELECT *
FROM permission p
LEFT JOIN role r ON p.role_id = r.id
LEFT JOIN team_role tr ON r.id = tr.role_id
LEFT JOIN team t ON tr.team_id = t.id AND t.special
LEFT JOIN user_role ur ON r.id = ur.role_id
LEFT JOIN "user" u ON ur.user_id = u.id
WHERE p.id = 1;

How would this syntax handle that situation?

@gisborne
Copy link

gisborne commented Dec 31, 2021 via email

@joelonsql
Copy link
Author

@al2o3cr @pjb1008 Thanks for bringing up an important case; Additional join conditions. I will try to come up with a syntax that works for this case! If you have any ideas, please share. I will add an empty section for it to the proposal.

@elo-siema
Copy link

@dolmen Because you can have:

  • INNER JOIN FOREIGN
  • OUTER JOIN FOREIGN
  • LEFT JOIN FOREIGN
    etc.

@koljonen
Copy link

Any reason not to just allow an optional ON clause for additional conditions?

@joelonsql
Copy link
Author

@stelf Thanks! I've added an example with a composite key.

@joelonsql
Copy link
Author

@koljonen I like the idea to just allow an optional ON clause for additional conditions. Will add it to the proposal, allowing others to comment on it.

@joelonsql
Copy link
Author

@shawn-simon Good question, I have added a section discussing this problem, "Tradeoffs: clarity vs conciseness, implicit vs explicit".

@kleontev
Copy link

Not a fan, sorry.

First of all, it's a leaking abstraction. The query is supposed to describe what the result should look like. It's not supposed to be concerned whether an integrity constraint is defined on one of the tables in the join. While at first it may seem that you're simplifying the query by abstracting away the join implementation details, in reality you're introducing extra coupling between the query and the database schema.

Which leads me to second. This syntax forces me to actually physically maintain that integrity constraint. I rename the foreign key - the query breaks. I disable the foreign key - the query breaks.

This has major implications on schema refactoring. When I perform one, I typically try to make it as backward-compatible as possible, at least for the RO tasks. One way to do it is to replace the legacy tables with a set of views that represent the data in a pre-refactor state. This is no longer an option with the JOIN FOREIGN syntax - we have views instead of tables and there are no foreign keys, so in the aforementioned refactoring every single query would have to be rewritten using a conventional syntax (and in the most of the enterprise software I've seen that would be A LOT of manual work).

And third, as it has already been mentioned, it's not general purpose. No OLAP whatsoever and not even all OLTP systems would benefit from the syntax (because many of them, for better or worse, implement integrity constraints at the application level).

Bottom line, while the proposal seems interesting on paper, after some consideration I can't imagine using it in the actual production code in case one of the vendors implement it. Just like the NATURAL JOIN.

@joelonsql
Copy link
Author

@na-ka-na Could any of the alternative syntax variants suggested in Tradeoffs: clarity vs conciseness, implicit vs explicit be to your liking?

@joelonsql
Copy link
Author

@kleontev wrote:

This syntax forces me to actually physically maintain that integrity constraint. I rename the foreign key - the query breaks.

This is true, and by design. I consider foreign keys as part of the data model, just like tables and their columns. If you would rename a table or column - the query breaks too.

I disable the foreign key - the query breaks.

Not saying it would be a good idea to disable a foreign key, but if there was a way to do it in some vendor, e.g. WITH NOCHECK in MSSQL, I don't see why a query using the JOIN FOREIGN syntax would break?

This has major implications on schema refactoring. When I perform one, I typically try to make it as backward-compatible as possible, at least for the RO tasks. One way to do it is to replace the legacy tables with a set of views that represent the data in a pre-refactor state. This is no longer an option with the JOIN FOREIGN syntax - we have views instead of tables and there are no foreign keys, so in the aforementioned refactoring every single query would have to be rewritten using a conventional syntax

Good point, but addressable:

The pattern you describe is: Renaming a table and creating a simple view with the old name, to allow applications that have not yet been refactored to continue use the old table name via the view.

I had not thought about this case, so many thanks for bringing it up!
Of course we must ensure it is supported.

In PostgreSQL, so-called "simple views" are updatable, see: https://www.postgresql.org/docs/current/sql-createview.html

Just as the system will allow INSERT, UPDATE and DELETE statements to be used on the view in the same way as on a regular table, I think JOIN FOREIGN could also be made to work as well.

And third, as it has already been mentioned, it's not general purpose. No OLAP whatsoever and not even all OLTP systems would benefit from the syntax (because many of them, for better or worse, implement integrity constraints at the application level).

There are naturally many different ways to use a SQL database. Users who only use it as a dumb data storage, not wanting it to be responsible for referential integrity, cannot and should not use JOIN FOREIGN. The audience for this proposal is SQL users who already rely heavily on foreign keys and consider them to be an essential and central part of the data model, just as e.g. column types would also be. I believe there are sufficient such SQL users to motivate the discussion of this proposal.

@kleontev
Copy link

kleontev commented Jan 1, 2022

Not saying it would be a good idea to disable a foreign key, but if there was a way to do it in some vendor, e.g. WITH NOCHECK in MSSQL, I don't see why a query using the JOIN FOREIGN syntax would break?

It may be a good idea. For instance, in Oracle certain types of batch loads can be orders of magnitude faster than conventional load, but they don't work with enforced referential integrity. You disable constraint, you perform the batch load, then you reenable (and re-validate) it. You can't use JOIN FOREIGN query when the constraint is in the disabled state, because a constraint disabled is a constraint not present as far as the query is concerned. Which means, if we want fast loads, we either block readers with weird query compilation errors for the duration of the load, or use a ENABLE NOVALIDATE RELY FK constraint, which doesn't need to be disabled during batch loads (I think...), but doesn't really enforce the integrity and basically tells the query engine to trust the data to be correct.

The pattern you describe is...

Yes, that's what I mean, except it's doesn't have to be simple views. For instance, we had that legacy CRUD application, problematic in many ways, which we were retiring and replacing with a new one. We implemented new data model for the replacement app and successfully moved the historical data to the new, however, dozens of reports were still looking directly into the legacy app's tables. We replaced those tables with views to ensure we didn't have to rewrite every single report, but those views were not simple (i.e. they weren't updatable and they were more than select a,b,c from new_table with a few where conditions - some of them were pretty complex).

@joelonsql
Copy link
Author

@kleontev wrote:

You can't use JOIN FOREIGN query when the constraint is in the disabled state, because a constraint disabled is a constraint not present as far as the query is concerned.

Why not? I assume a disabled FK would just mean the referential integrity of it isn't enforced, but the FK still exists, i.e. the query planner can lookup what column(s) are references between the two tables, so I see no problem of using it in a JOIN FOREIGN statement.
It's this part of what you say I don't understand:

because a constraint disabled is a constraint not present as far as the query is concerned

What type of query are you talking about in this context? I assume you mean JOIN FOREIGN? If so, since the semantics of it has not yet been designed, then I would absolutely suggest we have the possibility to instead decide that

a disabled foreign key constraint is present as far as the JOIN FOREIGN statement is concerned

which would address your problem.

but those views were not simple (i.e. they weren't updatable and they were more than select a,b,c from new_table with a few where conditions - some of them were pretty complex).

Actually, Updatable Views can contain WHEREconditions.

Quote the PostgreSQL documentation:

If an automatically updatable view contains a WHERE condition, the condition restricts which rows of the base relation are available to be modified by UPDATE and DELETE statements on the view.

@kleontev
Copy link

kleontev commented Jan 1, 2022

If so, since the semantics of it has not yet been designed, then I would absolutely suggest we have the possibility to instead decide that

a disabled foreign key constraint is present as far as the JOIN FOREIGN statement is concerned

which would address your problem.

It probably would, but in doing so it would introduce inconsistency in the query planner behavior. Currently Oracle's CBO treats disabled constraints (all of them, not just referential integrity) as non-existent when generating the execution plan. For instance, with a disabled FK constraint it would not consider an otherwise possible join elimination optimization if you were querying only child table columns from a view wrapping a parent-child join. So again, we have a dilemma: break user queries (writer blocks readers during a batch load, bad) or introduce a "special" planner behavior just to support JOIN FOREIGN syntax (probably not as bad, but still pretty bad, at least in my book - the fewer gotchas a developer has to be aware of the better).

Anyway, I'm not saying those issues are fundamentally unaddressable. All I'm trying to point out is that scope of this change (and potential implications) extends far beyond readability and SQL syntax in general. Personally I see more trouble than benefit from the usage (just like the natural join thing), but maybe it's just my experience and I'm not really the audience.

@joelonsql
Copy link
Author

@kleontev wrote:

It probably would, but in doing so it would introduce inconsistency in the query planner behavior. Currently Oracle's CBO treats disabled constraints (all of them, not just referential integrity) as non-existent when generating the execution plan. For instance, with a disabled FK constraint it would not consider an otherwise possible join elimination optimization if you were querying only child table columns from a view wrapping a parent-child join.

When implementing the JOIN FOREIGN feature, that code would just need to lookup the referencing/referenced column(s) based on the FK, even if it's disabled. All other parts of the query planner / optimizer should not be changed, they should continue to ignore the disabled FK when making decisions such as if a join can be eliminated.

In this way, there would be no inconsistency.

Anyway, I'm not saying those issues are fundamentally unaddressable. All I'm trying to point out is that scope of this change (and potential implications) extends far beyond readability and SQL syntax in general. Personally I see more trouble than benefit from the usage (just like the natural join thing), but maybe it's just my experience and I'm not really the audience.

I agree with you on the first part, that the scope extends far beyond readability and SQL syntax.
Personally, I see tremendous value in JOIN FOREIGN, since I write a lot of mission-critical SQL code where correctness is an absolute necessity.

I really appreciate all your helpful comments providing deep insights in the inner workings of Oracle.
Thanks for helping out in developing this proposal. The Updatable Views support is really important improvement.

@joelonsql
Copy link
Author

joelonsql commented Jan 3, 2022

The following users have raised concerns are against the idea of using foreign key names:

@schutzie wrote:

Makes sense, but I don't know the names of any of the FKs on any tables, existing functionality is still fine with me.

@na-ka-na wrote:

foreign key constraint names are usually bad and too verbose.

@dvnrvn wrote:

There is a cognitive cost to having to remember the name of foreign keys

@shawn-simon wrote:

You'd have to know the names of every foreign key in your DB, which overall decreases readability for someone new to the DB schema

I have worked on a new alternative syntax to specifically address the concerns raised by you. I would greatly appreciate your comments on it.

@pjb1008
Copy link

pjb1008 commented Jan 3, 2022

Re the cost of knowing foreign key names - I think that people wishing to use this feature will explicitly name their foreign key constraints in a way that makes semantic sense, rather than accepting the meaningless machine generated names. The reason people don't already do this as a matter of routine is that nothing (apart from dropping the constraint) uses the constraint name, so there is no incentive.

@slavonnet
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment