Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save tegos/0f40da45b1cdc1b7bd310234b91fb6f3 to your computer and use it in GitHub Desktop.
Save tegos/0f40da45b1cdc1b7bd310234b91fb6f3 to your computer and use it in GitHub Desktop.
Comparing Two MySQL Queries for Top Solution Authors

Comparing Two MySQL Queries for Top Solution Authors

This summarizes two queries to get the top users by solutions count, their corrected forms, simulated execution plans, and recommendations. For this video: Laravel Performance: Careful with withCount() by Povilas Korop.

Query 1 – Subquery in SELECT

Corrected Query

SELECT
    users.*,
    (
        SELECT COUNT(*)
        FROM replies
        INNER JOIN threads ON threads.solution_reply_id = replies.id
            AND threads.author_id != replies.author_id
        WHERE
            users.id = replies.author_id
            AND replies.replyable_type = 'threads'
            AND replies.created_at > '2024-09-10 11:29:36'
            AND replies.deleted_at IS NULL
    ) AS solutions_count
FROM users
HAVING
    solutions_count > 0
ORDER BY
    solutions_count DESC
LIMIT 5;

Simulated Execution Plan

id select_type table type possible_keys key rows Extra
1 PRIMARY users ALL NULL NULL 1000000 Using temporary; Using filesort
2 DEPENDENT SUBQUERY replies ref idx_replies_filters,author_id author_id 10 Using where
2 DEPENDENT SUBQUERY threads ref idx_threads_solution,solution_reply_id idx_threads_solution 5 Using where

Execution Flow Diagram

        +------------+
        |   users    |
        +------------+
              |
              v
     For each user, run subquery
              |
        +------------+
        |  replies   |
        +------------+
              |
              v
        +------------+
        |  threads   |
        +------------+
              |
              v
       Count matching replies

Notes:

  • Performs a full scan of users.
  • Executes a dependent subquery per user → N+1 pattern.
  • Can be costly on large tables.

Query 2 – JOIN + GROUP BY

Corrected Query

SELECT
    users.*,
    COUNT(DISTINCT replies.id) AS solutions_count
FROM users
INNER JOIN replies ON replies.author_id = users.id
INNER JOIN threads ON threads.solution_reply_id = replies.id
    AND threads.author_id != replies.author_id
WHERE
    replies.replyable_type = 'threads'
    AND replies.deleted_at IS NULL
    AND replies.created_at > '2024-09-10 11:29:36'
GROUP BY
    users.id
HAVING
    solutions_count > 0
ORDER BY
    solutions_count DESC
LIMIT 5;

Simulated Execution Plan

id select_type table type possible_keys key rows Extra
1 SIMPLE replies range idx_replies_filters,author_id idx_replies_filters 500000 Using where; Using temporary; Using filesort
1 SIMPLE threads ref idx_threads_solution,solution_reply_id idx_threads_solution 5 Using where
1 SIMPLE users eq_ref PRIMARY PRIMARY 1

Execution Flow Diagram

        +------------+
        |  replies   | <- filtered by date, type, deleted_at
        +------------+
              |
              v
        +------------+
        |  threads   | <- join on solution_reply_id
        +------------+
              |
              v
        +------------+
        |   users    | <- join on author_id
        +------------+
              |
              v
   GROUP BY user, COUNT(distinct replies)

Notes:

  • Efficient for leaderboards or global top N queries.
  • Uses GROUP BY with temporary table and filesort.
  • Works well with proper indexes.

Recommended Indexes

-- For filtering replies efficiently
CREATE INDEX idx_replies_filters
    ON replies (replyable_type, deleted_at, created_at, author_id);

-- For solution lookups
CREATE INDEX idx_threads_solution
    ON threads (solution_reply_id, author_id);

Comparison & Recommendation

Query Pros Cons
Subquery in SELECT Simple to read, per-user count Executes subquery per user (N+1), expensive on large tables
JOIN + GROUP BY Fast for global top N, single pass, leverages indexes Temporary table + filesort may use memory on large datasets

Verdict:

  • For a leaderboard / top 5 authorsJOIN + GROUP BY (Query 2) is strongly recommended.
  • For solutions_count alongside a small filtered set of users → Subquery (Query 1) is readable but risky at scale.

Notes

  • Always test with EXPLAIN to see row estimates and index usage.
  • Proper indexing is critical for performance.
  • Query 2 scales better with millions of users and replies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment