Skip to content

Instantly share code, notes, and snippets.

@thanhtruong1216
Last active November 3, 2021 08:50
Show Gist options
  • Save thanhtruong1216/cb5097269884128c617e26dd1cdc28ee to your computer and use it in GitHub Desktop.
Save thanhtruong1216/cb5097269884128c617e26dd1cdc28ee to your computer and use it in GitHub Desktop.
AnswerITViec.md

Question 1

Given an ActiveRecord model (https://gist.github.com/ancv1990/e72a0662d62959cf0ab4947cbc7918b3). How to optimize so that the method User#exists? get the best performance event with millions of record *

# == Schema Information
#
# Table name: users
#
#  id     : integer          not null, primary key
#  status : string
#  city   : string
#

class User < ActiveRecord::Base
  enum status: { active: 0, inactive: 1, banned: 2, pending: 3 }
  enum city: { hcm: 'HCM', hn: 'HN', dn: 'DN', other: 'Other' }

  def exists?(status, city)
    pluck(:status, :city).count{|user| user.status == status && user.city == city} > 0
  end
end

Answer

I think there is a mistake in the method implementation. The method exists should be a class method instead of instance method and it should use a different name (say self.custome_exists?) because ActiveRecord::Base has exists? method and we don't want to overwrite that.

An improved implementation is as follow:

class User < ActiveRecord::Base
  enum status: { active: 0, inactive: 1, banned: 2, pending: 3 }
  enum city: { hcm: 'HCM', hn: 'HN', dn: 'DN', other: 'Other' }

  def self.custome_exists?(status, city)
    exists?(status: status, city: city)
  end
end

This way is already a lot faster than the previous implementation because we never read all the records into memory and process it. This should have no problem handle millions of records. But we can make it even faster if we index both fields.

# migration
add_index :users, [:status, :city]

Question 2

Assume we have two models User and Post with relations shown here: https://goo.gl/BtHV7g. We need to to get list of user in descending order of number of posts. The current solution here: https://goo.gl/G6FRxM run too slow. Please explain why it slow? How to implement with best performance? Please explain your solution and show your code.

# == Schema Information
#
#  id    :integer          not null, primary key
#  name  :string(255)
class User < ActiveRecord::Base
  has_many :posts
end

# == Schema Information
#
#  id      :integer          not null, primary key
#  title   :string(255)
#  user_id :integer
class Post < ActiveRecord::Base
  belongs_to :user
end

User.select("*, (select count(*) from posts where posts.user_id=users.id) as post_count")
    .order("post_count desc")

Answer

This query is slow because the subquery will run one for every row in users table. Rewrite it to a join could help a little but Rails has Counter Cache to handle this exact problem in a much better way.

By implementing Counter Cache, we add a field call post_count to the users table and keep it updated with the number of posts whenever a post is created, deleted or updated. This way, the query will have the performance similar to select on the users table.

# == Schema Information
#
#  id      :integer          not null, primary key
#  title   :string(255)
#  user_id :integer
class Post < ActiveRecord::Base
  belongs_to :user, counter_cache: true
end

# == Schema Information
#
#  id            :integer          not null, primary key
#  post_count    :integer
#  name          :string(255)
class User < ActiveRecord::Base
  has_many :posts
end

The migration

class AddPostsCountToUser < ActiveRecord::Migration[6.0]
  def change
    add_column :users, :posts_count, :integer, default: 0

    User.all.each do |user|
      User.reset_counters(user.id, :posts)
    end
  end
end

And the query become:

User.all.order("post_count desc")

Question 3

Assume you already built a blog with many posts. When a new post get published, all user subscribed on your site will get a notification about new post via email. Now you want add an unsubscribed link to the email so that user can cancel subscription by clicking on that link. The link will expire after 3 days. Show the code to implement the unsubscribed link feature (only show code for model, services, migration, controller).

Answer

# == Schema Information
#
#  id            :integer          not null, primary key
#  name          :string(255)
class User < ActiveRecord::Base
  has_one :user_notification_config
  has_many :user_email_tokens
  after_create :create_user_notification_config


  private

  def create_user_notification_config
    user_notification_config.create(email_on_new_posts: true)
  end
end


# == Schema Information
#
#  id                  :integer          not null, primary key
#  email_on_new_posts  :boolean
#  user_id             :integer
class UserNotificationConfig < ActiveRecord::Base
  belongs_to :user
end


require 'securerandom'
# == Schema Information
#
#  id                  :integer          not null, primary key
#  token               :string(32)
#  exprired_at         :datetime
#  user_id             :integer

class UserEmailToken < ActiveRecord::Base
  belongs_to :user
  before_create :set_token
  before_create :set_expired_at

  def expired?
    exprired_at < Time.current
  end

  private

  def set_token
    self.token = SecureRandom.hex
  end

  def set_expired_at
    self.exprired_at = 3.days.from_now
  end
end


class GenerateUnsubscribeUrlService
  attr_accessor :user

  def initialize(user)
    self.user = user
  end

  def execute
    token = user.user_email_tokens.create
    user_notification_config_unsubscribe_posts_path(token: token.token)
  end
end


class UserNotificationConfigController < ApplicationController
  def unsubscribe_posts
    user_email_token = UserEmailToken.find_by(params[:token])

    if !user_email_token or user_email_token.expired?
      return render status: 404
    end

    user_email_token.user.user_notification_config.update(email_on_new_posts: false)

    render status: 200
  end
end

Question 4

Assume you have a blog site with more than 5000 posts. Now you need to track page view of the posts to show number of page view in post detail page when user view a post. Everyday there will be 1 million new page view to your posts. Use Postgresql to store the data. How will you design the system to maximize the performance?

Answer

I haven't worked on this kind of problem before but there are a couple of options:

  • Why do we have to store the page view in PostgreSQL and not other databases (e.g. MongoDB)? or Kafka or File storage?

  • If we have to store on postgres then the table will need partitioning: https://www.postgresql.org/docs/14/ddl-partitioning.html. I would say that every day views is a partion and can be think of as a separate table.

Question 5

Assume you have a ecommerce site to sell some machines. There are four kinds of machine: MA, MB, MC, MD. Each kind has own unit price. Customers can buy machine in single-packs such as: 2MA, 3MA, 5MB, 20MC or multiple-packs such as: 2x2MA+3x5MB. Each pack has own discount. Administrators of the site can change discount of packs and the unit price of machine. Design models

Answer

# == Schema Information
#
#  id            :integer          not null, primary key
#  name          :string(255)
#  pack_size     :integer
class Machine < ActiveRecord::Base
  has_many :packages, through :machine_packages
end

# == Schema Information
#
#  id            :integer          not null, primary key
#  machine_id    :integer
#  package_id    :integer
class MachinePackage < ActiveRecord::Base
  belongs_to :package
  belongs_to :machine
end

# == Schema Information
#
#  id            :integer          not null, primary key
#  name          :string(255)
#  discount      :decimal
class Package < ActiveRecord::Base
  has_many :machines, through :machine_packages
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment