-
-
Save abrookins/533d5a1371667443bd7d7973375f3400 to your computer and use it in GitHub Desktop.
Django join promotion
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
========================= | |
Join promotion in the ORM | |
========================= | |
[NOTE: We need better terms than promote and demote for changing the join | |
type. These terms are extremely easy to mix up. Maybe the ORM methods could | |
be to_inner_joins and to_louter_joins instead of promote_joins and demote_joins? | |
I tried to clean up the mis-usages of promotion/demotion but there could still | |
be some cases where these are mixed up] | |
Join promotion refers to changing inner joins to left outer joins. | |
Similarly, join demotion is the action of changing left joins to inner joins. | |
The reason Django ORM uses inner joins is that they are much more performant | |
(the difference can be orders of magnitude). There is no other reason to | |
demote joins to inner joins. That is, when using the Django ORM, results will | |
always be correct even when all joins are set to left outer joins. | |
Join demotion is done in case the ORM can prove the results will be correct even | |
when inner joins are used. We can always demote any join that must have a value | |
on the remote side of the join (that is, join along non-nullable foreign key). | |
We will also demote joins when filtering in such a way that the filter can only | |
match when the value is non-null. | |
Lets assume we have the following models:: | |
class Book(models.Model): | |
title = models.TextField() | |
alias = models.TextField(null=True) | |
author = models.ForeignKey(Author) | |
class Author(models.Model): | |
name = models.TextField() | |
favourite_book = models.ForeignKey(Book, null=True) | |
first_book = models.ForeignKey(Book, null=True) | |
When doing `Author.objects.filter(favourite_book__title='Foo')` we *can* use an | |
inner join. The foreign key is nullable, but if a given author has no favourite | |
book, its title naturally can't be 'Foo'. | |
On the other hand, when doing `Author.objects.filter(favourite_book__alias__isnull=True)` | |
we *must* use a left outer join - both those authors who have no favourite book, and | |
those author's whose favourite book has no alias must mach. (If you want only authors | |
who have a favourite book, but the books alias is null, then you must add | |
favourite_book__isnull=False to the filter().) | |
Things get a bit more complicated when OR combined queries are considered. For example, | |
when doing `Author.objects.filter(Q(favourite_book__title='Foo') | Q(first_book__title='Bar')), | |
we must use left outer joins. Why? Lets explain this with some example cases. | |
First case is an Author that has a favourite_book with title 'Foo', and no first_book at | |
all (the author isn't actually an author yet...). If we were to use inner joins, then the | |
join to first_book would remove the author row from results before filtering was done (this | |
is simply because inner joins remove any rows from the result set that have no mathing row | |
for the join). As the second case we can consider an author with first_book with title | |
'Bar', but no favourite book - again the inner join would remove the result even before the | |
WHERE clause of the query gets applied. | |
One the other hand, when doing Q(favourite_book__title='Foo') | Q(favourite_book__title='Bar') we | |
can use an inner join. The reason is that if an author has no favourite | |
book, then neither of the above filters can match. Still, if the query was | |
Q(favourite_book__title='Foo') | Q(favourite_book__title__isnull=True), then we must | |
again use left outer join, as otherwise the latter condition would miss those cases | |
where the title is null because the author has no favourite book. | |
When doing annotations, it's notable that we must use left outer joins for any | |
new join created for an annotation (keeping in mind that we never demote joins for relations | |
that must always have results). Otherwise adding an annotation could remove | |
results by the inner join it creates. | |
Also, after we have decided a given join must be a left join, all joins "starting" from that | |
join must be left joins, too. The reason is again that an inner join, even after an louter | |
join will remove results. So, in a query like:: | |
SELECT * | |
FROM author | |
LEFT JOIN book ON author.favourite_book_id = author.id | |
INNER JOIN author ON book.author_id = author.id | |
the inner join effectively changes the left join to book into an inner join. | |
Currently, the only lookup that can match null values is isnull. However, it seems plausible | |
that custom lookups can be used to create other lookups that match null values. This will be | |
immediately obvious when expressions are allowed in the filter() clause. For example, | |
.filter(Equal(Coalesce('favourite_book__title', 'first_book__title'), 'Foo')) must use | |
left outer joins for the joins, as otherwise an author that has favourite_book with title | |
'Foo', but no first book would not be matched. | |
Currently, when doing .annotate(coalesced=Coalesce('favourite_book__title', 'first_book__title')).filter(colesced='Foo') | |
we will use left outer joins. This is because any annotation will use left outer joins for any | |
new joins added to the query, and we don't currently do join demotion when filtering on | |
annotations. | |
How do we decide which joins to promote in sql.Query? | |
===================================================== | |
In Django, each individual filter clause is built with sql.Query.build_filter(). The build_filter() | |
method returns the lookup object to be added to the query, and it also returns the joins it | |
would prefer to be inner joins (I just realized I have made a big mistake in naming the return | |
value "needed_inner", when in fact it should be something like "joins_that_can_be_inner"). | |
The _add_q() method then uses a JoinPromoter object to count "votes" for inner joins. It does this | |
for each Q-object separately. The way this works is that each child of given Q-object is first | |
processed by build_filter, then the votes are added to the JoinPromoter. | |
The JoinPromoter has logic for deciding if a join should be promoted to louter join, or if a join | |
should be demoted to an inner join. | |
The join promoter needs votes from all children of a Q object, and it also needs to know how many | |
children the Q object has, and the connector of the Q object. Still, it needs to know | |
if we are dealing with a negated lookup (as NOT AND == OR for join promotion [TODO: add reason | |
why]). | |
What does the JoinPromoter do? For the OR case, it demotes all joins where each of the Q | |
object's children reported that that join can be inner join. As we saw before in the example | |
cases, when all child clauses use the same join, and all child clauses prefer that join to | |
be an inner join (ie none of the clauses are interested in null values), then those joins | |
can be demoted to inner joins. If there are less votes than child clauses for given join, | |
then the promoter decides that the joins need to be promoted. Finally those joins that | |
have no votes at all (that is, they weren't used by the Q-object the promoter is dealing | |
with), then that join will be left as is. | |
So, getting back to our two original examples. Case `Q(favourite_book__title='Foo') | Q(first_book__title='Bar')`. | |
The build_filter() for the first condition reports that the favourite_book join can be an inner join, | |
and for the second condition similarly the first_book join can be an inner join. But, as this is an | |
OR filter, and both of the joins have only one vote, we decide that both of those joins must | |
be left outer joins. | |
On the other hand, for the case `Q(favourite_book__title='Foo') | Q(favourite_book__title='Bar')`, | |
both filters report that favourite_book can be an inner join. The join promoter sees there are | |
as many votes for favourite_book join to be inner join as there are conditions, and thus it decides | |
the join can be demoted to inner join. | |
[To be continued later on...] | |
Other notes: | |
- When ANDin (or NOT ORing) conditions, a single vote for inner join is enough to turn the | |
join into inner join. If a given clause of ANDed or NOT ORed condition does not match, | |
then the whole clause can not match. | |
- Each Q-object reports the joins is prefers to be inner joins separately to its parent, and | |
Q-object children are treated just like build_filter() children. | |
- So, for example (Q(favourite_book__title='Foo') | Q(first_book__title='Bar')) & Q(favourite_book__title__icontains='o') | |
will produce an inner join for favourite_book. First, join promoter handles the ORed children | |
of the whole Q-object as we have seen above. So, both joins are promoted to left outer | |
joins, and that children reports that no joins are can be demoted to inner joins (from | |
that child's perspective). Next, the icontains lookup is handled, and that lookup | |
says that it prefers the join to be inner join. And, as it is AND connected, a single | |
vote for inner join is enough to demote the join. Note that this is correct, as if | |
the icontain lookup gets a null value, that lookup can't match. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment