Given an ActiveRecord model (https://gist.github.com/ancv1990/e72a0662d62959cf0ab4947cbc7918b3). How to optimize so that the method User#exists? get the best performance event with millions of record *
# == Schema Information
#
# Table name: users
#
# id : integer not null, primary key
# status : string
# city : string
#
class User < ActiveRecord::Base
enum status: { active: 0, inactive: 1, banned: 2, pending: 3 }
enum city: { hcm: 'HCM', hn: 'HN', dn: 'DN', other: 'Other' }
def exists?(status, city)
pluck(:status, :city).count{|user| user.status == status && user.city == city} > 0
end
end
I think there is a mistake in the method implementation. The method exists
should be a class method instead of instance method and it should use a different name (say self.custome_exists?
) because ActiveRecord::Base has exists?
method and we don't want to overwrite that.
An improved implementation is as follow:
class User < ActiveRecord::Base
enum status: { active: 0, inactive: 1, banned: 2, pending: 3 }
enum city: { hcm: 'HCM', hn: 'HN', dn: 'DN', other: 'Other' }
def self.custome_exists?(status, city)
exists?(status: status, city: city)
end
end
This way is already a lot faster than the previous implementation because we never read all the records into memory and process it. This should have no problem handle millions of records. But we can make it even faster if we index both fields.
# migration
add_index :users, [:status, :city]
Assume we have two models User and Post with relations shown here: https://goo.gl/BtHV7g. We need to to get list of user in descending order of number of posts. The current solution here: https://goo.gl/G6FRxM run too slow. Please explain why it slow? How to implement with best performance? Please explain your solution and show your code.
# == Schema Information
#
# id :integer not null, primary key
# name :string(255)
class User < ActiveRecord::Base
has_many :posts
end
# == Schema Information
#
# id :integer not null, primary key
# title :string(255)
# user_id :integer
class Post < ActiveRecord::Base
belongs_to :user
end
User.select("*, (select count(*) from posts where posts.user_id=users.id) as post_count")
.order("post_count desc")
This query is slow because the subquery will run one for every row in users table. Rewrite it to a join could help a little but Rails has Counter Cache to handle this exact problem in a much better way.
By implementing Counter Cache, we add a field call post_count
to the users
table and keep it updated with the number of posts whenever a post is created, deleted or updated. This way, the query will have the performance similar to select on the users
table.
# == Schema Information
#
# id :integer not null, primary key
# title :string(255)
# user_id :integer
class Post < ActiveRecord::Base
belongs_to :user, counter_cache: true
end
# == Schema Information
#
# id :integer not null, primary key
# post_count :integer
# name :string(255)
class User < ActiveRecord::Base
has_many :posts
end
The migration
class AddPostsCountToUser < ActiveRecord::Migration[6.0]
def change
add_column :users, :posts_count, :integer, default: 0
User.all.each do |user|
User.reset_counters(user.id, :posts)
end
end
end
And the query become:
User.all.order("post_count desc")
Assume you already built a blog with many posts. When a new post get published, all user subscribed on your site will get a notification about new post via email. Now you want add an unsubscribed link to the email so that user can cancel subscription by clicking on that link. The link will expire after 3 days. Show the code to implement the unsubscribed link feature (only show code for model, services, migration, controller).
# == Schema Information
#
# id :integer not null, primary key
# name :string(255)
class User < ActiveRecord::Base
has_one :user_notification_config
has_many :user_email_tokens
after_create :create_user_notification_config
private
def create_user_notification_config
user_notification_config.create(email_on_new_posts: true)
end
end
# == Schema Information
#
# id :integer not null, primary key
# email_on_new_posts :boolean
# user_id :integer
class UserNotificationConfig < ActiveRecord::Base
belongs_to :user
end
require 'securerandom'
# == Schema Information
#
# id :integer not null, primary key
# token :string(32)
# exprired_at :datetime
# user_id :integer
class UserEmailToken < ActiveRecord::Base
belongs_to :user
before_create :set_token
before_create :set_expired_at
def expired?
exprired_at < Time.current
end
private
def set_token
self.token = SecureRandom.hex
end
def set_expired_at
self.exprired_at = 3.days.from_now
end
end
class GenerateUnsubscribeUrlService
attr_accessor :user
def initialize(user)
self.user = user
end
def execute
token = user.user_email_tokens.create
user_notification_config_unsubscribe_posts_path(token: token.token)
end
end
class UserNotificationConfigController < ApplicationController
def unsubscribe_posts
user_email_token = UserEmailToken.find_by(params[:token])
if !user_email_token or user_email_token.expired?
return render status: 404
end
user_email_token.user.user_notification_config.update(email_on_new_posts: false)
render status: 200
end
end
Assume you have a blog site with more than 5000 posts. Now you need to track page view of the posts to show number of page view in post detail page when user view a post. Everyday there will be 1 million new page view to your posts. Use Postgresql to store the data. How will you design the system to maximize the performance?
I haven't worked on this kind of problem before but there are a couple of options:
-
Why do we have to store the page view in PostgreSQL and not other databases (e.g. MongoDB)? or Kafka or File storage?
-
If we have to store on postgres then the table will need partitioning: https://www.postgresql.org/docs/14/ddl-partitioning.html. I would say that every day views is a partion and can be think of as a separate table.
Assume you have a ecommerce site to sell some machines. There are four kinds of machine: MA, MB, MC, MD. Each kind has own unit price. Customers can buy machine in single-packs such as: 2MA, 3MA, 5MB, 20MC or multiple-packs such as: 2x2MA+3x5MB. Each pack has own discount. Administrators of the site can change discount of packs and the unit price of machine. Design models
# == Schema Information
#
# id :integer not null, primary key
# name :string(255)
# pack_size :integer
class Machine < ActiveRecord::Base
has_many :packages, through :machine_packages
end
# == Schema Information
#
# id :integer not null, primary key
# machine_id :integer
# package_id :integer
class MachinePackage < ActiveRecord::Base
belongs_to :package
belongs_to :machine
end
# == Schema Information
#
# id :integer not null, primary key
# name :string(255)
# discount :decimal
class Package < ActiveRecord::Base
has_many :machines, through :machine_packages
end