Skip to content

Instantly share code, notes, and snippets.

@odrotbohm
Created August 28, 2012 10:36
Show Gist options
  • Save odrotbohm/3497047 to your computer and use it in GitHub Desktop.
Save odrotbohm/3497047 to your computer and use it in GitHub Desktop.
JPQL / SQL riddle
JPQL: select p from Person p
left outer join p.address address
order by address.city
SQL: select person0_ from Person person0_
left outer join Address address1_ on person0_.id=address1_.person_id
order by address1_.city
JPQL: select p from Person p
left outer join p.address address
order by p.address.city
SQL: select person0_ from Person person0_
left outer join Address address1_ on person0_.id=address1_.person_id
cross join Address address2_ where person0_.id=address2_.person_id
order by address2_.city
The core difference here is that qualifying the sorting criteria to p.address.city
instead of address.city results in an additional JOIN clause added which results in
persons not having an address being ruled out from the result set. Environment is
Hibernate 4.1.5.SP1, H2 1.3.168.
1. Should the two JPQL queries result in the SQL queries shown? In particular
shouldn't they actually generate the very same SQL?
2. Why does the second JPQL query add the additional JOIN which effectively rules
out persons without an address?
3. Is it a good idea that adding a sort criterion potentially adds an additional
join which influences the actual result set return (i.e. "the number of results
changes just because I added a sort criteria")
@rgielen
Copy link

rgielen commented Aug 29, 2012

I think what I'm trying to say is that this is an issue related the "M" in ORM. M stands for consistent mappings of idioms only available in / suitable for the O tier to the R tier. The path navigation is clearly an O concept, which we have to map to R. For the sake of consistency, it always imposes a side effect, namely an inner join.

My second SQL example was a bit wrong. Better would have been

select p from Person p where p.address.city is null

Here you can see a real side effect of the path expression, without being anyhow explicit about joins. How does this side effect differ from the one imposed by order by, rethinking both examples with both possible join semantics? The consistency lies therein that a path navigation expression solely and always consistently affects the dynaset definition (FROM part) of your query, and not the projection (SELECT), selection (WHERE) or ORDER BY. So regardless where it is used, it should have the same meaning by having the same side effect on the dynaset. And no, it is not a side effect on the WHERE clause, although it may look that way in the generated SQL. Semantically it only adds an INNER JOINto the FROMclause, which Hibernate for whatever reason de-composes to a selection on a cartesian product. This is valid in relational algebra and the correct decomposition of the non-basic JOIN operation - but it is totally idiotic to use it in a DMBS with extended relational algebra representation, meaning anything newer than Oracle 8i :)

To be clear about, I was not talking about SQL at all, so of course I would not feel good about side effects on my dynaset imposed by ORDER BY in my SQL. Ironically SQL would not even allow me to generate side effects, because each and everything I want to use for ordering has to be explicitly represented by my FROMclause, including the right and thoughtfully chosen join needed to order by a criterion in a related table.

PS: The statement corrected above, although not intended, would have been correct for a "German-speaking" DBMS, since in Germany 'Bielefeld' equals null :)

@rgielen
Copy link

rgielen commented Aug 29, 2012

An addition: Say we have the following statement:

select p, privateAddress.city, companyAddress.city  from Person p 
      left outer join p.addresses privateAddress 
      left outer join p.addresses companyAddress
where
     privateAddress.type='PRIVATE' and companyAddress.type='COMPANY'

Regardless whether this looks like the best possible modeling - what exactly should

ORDER BY addresses.city

now do when added, following your recommendation to implicitly / automatically re-use widening joins when having an order by expression?

@odrotbohm
Copy link
Author

Nothing at all, it's invalid as addesses is not known at all. p.addresses would have been. Does the query actually make sense? Why would you want to outer join the tables (retaining null values) if you actually have a where clause that will definitely return false for these values?

Again, as soon as you manually add joins you do so for a reason and then it's up to you to define all related clauses correctly. But if you don't do (select p from Person p), adding an order byclause simply must not restrict the result set more than actually expressed in the query.

@rgielen
Copy link

rgielen commented Aug 29, 2012

OK, I seem to have way too much time (I haven't), but well - it''s an interesting problem.

So, here's another addition.

  1. my last comment over-emphasizes the concept of re-using an existing outer join. If applying the rule of "non-aliased order by path expressions implicitly add an outer join to the resulting SQL FROMclause, it would be clear what the query does and the actual selection in terms of rows would not be affected.
  2. Navigation expressions are all about navigating an object graph, since JPA tries to hide the relational stuff and give us a "purely" object oriented view on our data, right? So when applying the rule from point 1, you would effectively say: this ORDER BY is not an object graph expression any more, because in an object graph we can only walk along transitions - and not non-transitions, which is what a left outer join would construct. So if ORDER BY a.b would result in an outer join, the path expression applied to the object graph would read [a | ].b, which feels utterly wrong to me in an object model.

@rgielen
Copy link

rgielen commented Aug 29, 2012

Regarding your comment to the addresses-example:

Of course it should have read ORDER BY p.addresses. The example does make sense as a classic "map rows to columns" example, relying on an effective query planner and executer in the DBMS, causing least possible communication overhead between DB and JVM. More realistic would be: Generate result rows of form

firstname, lastname, home phone, business phone

given that phone numbers are attributes of an entity with 1:n relation to Persons and a discriminator column type

But again, the example not really good to make any case for my point in opposite to yours, as mentioned in my last comment - so forget about it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment