In my application every time I send a newsletter to a User, I create a NewsletterDelivery record. I frequently want to be able to query, for each user, what is the most recent newsletter delivery record. This is called a "last n per group" query and a LATERAL JOIN
is the best way to do it imo. But the query I've been using (and I've told people to use, and seen blogged about) is not very good, because the conditions don't get pushed down into the subselect which means that the query ends-up lateral-joining all the records before it applies the conditions for the association.
Instead of doing subselect_table.*
the better query does association.id AS assocation_id, subselect_table.id, subselect_table.title, ....
and enumerates over all of the columns. This allows the association query, which Active Record tacks on at the end as WHERE association_id = $1
or WHERE association_id IN ($1, $2, $3, ...)
to be pushed down correctly.
ROWS=8865
🥵
SELECT "newsletter_deliveries".*
FROM (
SELECT latest.*
FROM "users"
INNER JOIN LATERAL (
SELECT "newsletter_deliveries".*
FROM "newsletter_deliveries"
WHERE (user_id = users.id)
ORDER BY "newsletter_deliveries"."created_at" DESC
LIMIT 1
) AS latest ON TRUE
) newsletter_deliveries
WHERE "newsletter_deliveries"."user_id" IN (1, 2)
Nested Loop (cost=0.72..13168.14 rows=8865 width=80)
-> Index Only Scan using users_pkey on users (cost=0.29..336.26 rows=8865 width=4)
-> Subquery Scan on latest (cost=0.43..1.44 rows=1 width=80)
Filter: (latest.user_id = ANY ('{1,2}'::bigint[]))
-> Limit (cost=0.43..1.42 rows=1 width=80)
-> Index Scan using index_newsletter_deliveries_on_user_id_and_created_at on newsletter_deliveries (cost=0.43..1444.49 rows=1455 width=80)
Index Cond: (user_id = users.id)
ROWS=2
🎉
SELECT * FROM (
SELECT "users"."id" AS user_id, "latest"."id", "latest"."created_at", "latest"."updated_at", "latest"."delivered_at", "latest"."opened_at", "latest"."clicked_at", "latest"."spammed_at", "latest"."uuid"
FROM "users"
INNER JOIN LATERAL (
SELECT "newsletter_deliveries".*
FROM "newsletter_deliveries"
WHERE (user_id = users.id)
ORDER BY "newsletter_deliveries"."created_at" DESC
LIMIT 1
) AS latest ON TRUE
) newsletter_deliveries
WHERE "newsletter_deliveries"."user_id" IN (1, 2)
Nested Loop (cost=0.72..7.17 rows=2 width=76)
-> Index Only Scan using users_pkey on users (cost=0.29..4.28 rows=2 width=4)
Index Cond: (id = ANY ('{1,2}'::integer[]))
-> Limit (cost=0.43..1.42 rows=1 width=80)
-> Index Scan using index_newsletter_deliveries_on_user_id_and_created_at on newsletter_deliveries (cost=0.43..1444.49 rows=1455 width=80)
Index Cond: (user_id = users.id)
class NewsletterDelivery < ApplicationRecord
belongs_to :user
scope :recent, lambda { |count = 1|
latest_table = Arel::Table.new('latest')
recent_newsletters = User
.select(User.arel_table["id"].as("user_id"))
.select(NewsletterDelivery.column_names.without("user_id").map { |column| latest_table[column] })
.arel.join(
NewsletterDelivery
.where(arel_table["user_id"].eq(User.arel_table["id"]))
.order(created_at: :desc)
.limit(count)
.arel.lateral(latest_table.name)
).on("TRUE")
from(recent_newsletters.as(arel_table.name))
}
end
class User < ApplicationRecord
has_many :newsletter_deliveries
has_one :recent_newsletter_delivery, -> { recent(1) }, class_name: "NewsletterDelivery"
end
The previous query enumerates all of the columns. This is necessary to avoid ERROR: column reference "user_id" is ambiguous
. An way to work around this is to alias it to something different and use that same foreign key name when constructiong the association:
class NewsletterDelivery < ApplicationRecord
belongs_to :user
scope :recent, lambda { |count = 1|
latest_table = Arel::Table.new('latest')
recent_newsletters = User
.select(User.arel_table["id"].as("user_id_alias"), latest_table[Arel.star])
.arel.join(
NewsletterDelivery
.where(arel_table["user_id"].eq(User.arel_table["id"]))
.order(created_at: :desc)
.limit(count)
.arel.lateral(latest_table.name)
).on("TRUE")
from(recent_newsletters.as(arel_table.name))
}
end
class User < ApplicationRecord
has_many :newsletter_deliveries
has_one :recent_newsletter_delivery, -> { recent(1) }, class_name: "NewsletterDelivery", foreign_key: "user_id_alias"
end