Skip to content

Instantly share code, notes, and snippets.

@bensheldon
Last active July 25, 2024 16:50
Show Gist options
  • Save bensheldon/4054feed291e8ae7f54d40b14ad7ba79 to your computer and use it in GitHub Desktop.
Save bensheldon/4054feed291e8ae7f54d40b14ad7ba79 to your computer and use it in GitHub Desktop.

In my application every time I send a newsletter to a User, I create a NewsletterDelivery record. I frequently want to be able to query, for each user, what is the most recent newsletter delivery record. This is called a "last n per group" query and a LATERAL JOIN is the best way to do it imo. But the query I've been using (and I've told people to use, and seen blogged about) is not very good, because the conditions don't get pushed down into the subselect which means that the query ends-up lateral-joining all the records before it applies the conditions for the association.

Instead of doing subselect_table.* the better query does association.id AS assocation_id, subselect_table.id, subselect_table.title, .... and enumerates over all of the columns. This allows the association query, which Active Record tacks on at the end as WHERE association_id = $1 or WHERE association_id IN ($1, $2, $3, ...) to be pushed down correctly.

Old query

ROWS=8865 🥵

SELECT "newsletter_deliveries".* 
FROM (
    SELECT latest.* 
    FROM "users" 
    INNER JOIN LATERAL (
        SELECT "newsletter_deliveries".* 
	FROM "newsletter_deliveries" 
	WHERE (user_id = users.id) 
	ORDER BY "newsletter_deliveries"."created_at" DESC 
	LIMIT 1
    ) AS latest ON TRUE
) newsletter_deliveries 
WHERE "newsletter_deliveries"."user_id" IN (1, 2)

Nested Loop  (cost=0.72..13168.14 rows=8865 width=80)
  ->  Index Only Scan using users_pkey on users  (cost=0.29..336.26 rows=8865 width=4)
  ->  Subquery Scan on latest  (cost=0.43..1.44 rows=1 width=80)
        Filter: (latest.user_id = ANY ('{1,2}'::bigint[]))
        ->  Limit  (cost=0.43..1.42 rows=1 width=80)
              ->  Index Scan using index_newsletter_deliveries_on_user_id_and_created_at on newsletter_deliveries  (cost=0.43..1444.49 rows=1455 width=80)
                    Index Cond: (user_id = users.id)

New query

ROWS=2 🎉

SELECT * FROM (
    SELECT "users"."id" AS user_id, "latest"."id", "latest"."created_at", "latest"."updated_at", "latest"."delivered_at", "latest"."opened_at", "latest"."clicked_at", "latest"."spammed_at", "latest"."uuid"
    FROM "users" 
    INNER JOIN LATERAL (
        SELECT "newsletter_deliveries".* 
        FROM "newsletter_deliveries" 
        WHERE (user_id = users.id) 
        ORDER BY "newsletter_deliveries"."created_at" DESC 
        LIMIT 1
    ) AS latest ON TRUE
) newsletter_deliveries 
WHERE "newsletter_deliveries"."user_id" IN (1, 2)

Nested Loop  (cost=0.72..7.17 rows=2 width=76)
  ->  Index Only Scan using users_pkey on users  (cost=0.29..4.28 rows=2 width=4)
        Index Cond: (id = ANY ('{1,2}'::integer[]))
  ->  Limit  (cost=0.43..1.42 rows=1 width=80)
        ->  Index Scan using index_newsletter_deliveries_on_user_id_and_created_at on newsletter_deliveries  (cost=0.43..1444.49 rows=1455 width=80)
              Index Cond: (user_id = users.id)

In Arel

class NewsletterDelivery < ApplicationRecord
  belongs_to :user

  scope :recent, lambda { |count = 1|
    latest_table = Arel::Table.new('latest')
    recent_newsletters = User
                         .select(User.arel_table["id"].as("user_id"))
                         .select(NewsletterDelivery.column_names.without("user_id").map { |column| latest_table[column] })
                         .arel.join(
                           NewsletterDelivery
                             .where(arel_table["user_id"].eq(User.arel_table["id"]))
                             .order(created_at: :desc)
                             .limit(count)
                             .arel.lateral(latest_table.name)
                         ).on("TRUE")
    from(recent_newsletters.as(arel_table.name))
  }
end

class User < ApplicationRecord
  has_many :newsletter_deliveries
  has_one :recent_newsletter_delivery, -> { recent(1) }, class_name: "NewsletterDelivery"
end

An even better query

The previous query enumerates all of the columns. This is necessary to avoid ERROR: column reference "user_id" is ambiguous. An way to work around this is to alias it to something different and use that same foreign key name when constructiong the association:

class NewsletterDelivery < ApplicationRecord
  belongs_to :user

  scope :recent, lambda { |count = 1|
    latest_table = Arel::Table.new('latest')
    recent_newsletters = User
                         .select(User.arel_table["id"].as("user_id_alias"), latest_table[Arel.star])
                         .arel.join(
                           NewsletterDelivery
                             .where(arel_table["user_id"].eq(User.arel_table["id"]))
                             .order(created_at: :desc)
                             .limit(count)
                             .arel.lateral(latest_table.name)
                         ).on("TRUE")
    from(recent_newsletters.as(arel_table.name))
  }
end

class User < ApplicationRecord
  has_many :newsletter_deliveries
  has_one :recent_newsletter_delivery, -> { recent(1) }, class_name: "NewsletterDelivery", foreign_key: "user_id_alias"
end
# frozen_string_literal: true
module HasOfMany
extend ActiveSupport::Concern
class_methods do
# Fetch the first record of a has_one-like association using a lateral join
#
# @example Given posts have many comments; query the most recent comment on a post
# class Post < ApplicationRecord
# one_of_many :last_comment, -> { order(created_at: :desc) }, class_name: 'Comment'
# end
#
# Post.published.includes(:last_comment).each do |post
# posts.last_comment # => #<Comment>
# end
def has_one_of_many(name, scope = nil, **options)
_has_of_many(:one, name, 1, scope, **options)
end
# Fetch a limited number of records of a has_many-like association using a lateral join
#
# @example The 10 most recent comments on a post
# class Post < ApplicationRecord
# has_some_of_many :last_ten_comments, 10, -> { order(created_at: :desc) }, class_name: 'Comment'
# end
#
# Post.published.includes(:last_ten_comments).each do |post
# posts.last_ten_comments # => [#<Comment>, #<Comment>, ...]
# end
def has_some_of_many(name, limit, scope = nil, **options)
_has_of_many(:many, name, limit, scope, **options)
end
# Rewrites the has_one or has_many association to use a lateral subselect.
# The resulting SQL looks like:
#
# SELECT "comments".*
# FROM (
# SELECT
# "posts"."id" AS post_id_alias, -- alias the foreign key to avoid ambiguous duplicate column names
# "lateral_table".*
# FROM "posts"
# INNER JOIN LATERAL (
# SELECT "comments".*
# FROM "comments"
# WHERE "comments"."post_id" = "posts"."id"
# ORDER BY "posts"."created_at" DESC
# LIMIT 10
# ) lateral_table ON TRUE
# ) comments WHERE "comments"."post_id_alias" IN (1, 2, 3, 4, 5)
#
def _has_of_many(type, name, limit, scope = nil, **options)
model_class = self
primary_key = options[:primary_key] || self.primary_key
foreign_key = options[:foreign_key] || "#{self.name.underscore}_id"
foreign_key_alias = options[:foreign_key_alias] || "#{foreign_key}_alias"
lateral_subselection_scope = lambda do
current_relation = scope ? instance_exec(&scope) : self
lateral_table = Arel::Table.new('lateral_table')
subselect = model_class.unscope(:select)
.select(model_class.arel_table[primary_key].as(foreign_key_alias), lateral_table[Arel.star])
.arel.join(
current_relation
.where(current_relation.arel_table[foreign_key].eq(model_class.arel_table[primary_key]))
.limit(limit)
.arel.lateral(lateral_table.name)
).on("TRUE")
current_relation.klass.from(subselect.as(arel_table.name))
end
options[:primary_key] = primary_key
options[:foreign_key] = foreign_key_alias
if type == :one
has_one(name, lateral_subselection_scope, **options)
elsif type == :many
has_many(name, lateral_subselection_scope, **options)
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment