Skip to content

Instantly share code, notes, and snippets.

@kyledcline
Last active August 8, 2024 13:30
Show Gist options
  • Save kyledcline/9b7e864b89c269beb2c34e55fb0903b0 to your computer and use it in GitHub Desktop.
Save kyledcline/9b7e864b89c269beb2c34e55fb0903b0 to your computer and use it in GitHub Desktop.
Postgres Best Practices

PSQL CLI Client

Psql is a fully-fledged CLI client for Postgres, but most people are unaware of its many advanced features.

~/.psqlrc can be edited to persist any behavior or configuration settings you want between psql sessions. It behaves just like ~/.bashrc or ~/.vimrc, sourced at psql launch. See More out of psql for some interesting configurations.

If you have a long query to write and rewrite, you can use \e to edit your query in an editor.

Use \watch at the end of a query in order to automatically re-run the query every few seconds - great for monitoring while making changes elsewhere in your application architecture.

Display & Formatting

Use \x to toggle extended display, which presents query results in a cleaner tabulated format.

EXPLAIN (along with similar commands) accepts formatting arguments.

EXPLAIN (format json) SELECT * FROM requests;

For JSONB columns, use jsonb_pretty(column_name) to pretty print your data instead of a giant compressed text blob.

Indexing

If you are programmatically ordering, sorting, or grouping by a column, you should probably have an index on it. The caveat is that indexing slows down writes to the table.

It's almost always a good idea to create indices concurrently to prevent synchronous locking on a table during migrations.

CREATE INDEX CONCURRENTLY ON reports (report_id);

Composite indices are useful when you want to optimize querying on multiple non-conditional columns.

SELECT * FROM posts WHERE post_id = 232 AND comment_id = 65 AND owner = 'Bob';

could be optimized by indexing with

CREATE INDEX ON posts (post_id, comment_id, owner);

The above index is only used if your query includes all 3 indexed columns. If you supply multiple simple indices instead, Postgres will optimize during the planning phase which combination of indices to use (or none at all).

CREATE INDEX ON posts (post_id);
CREATE INDEX ON posts (comment_id);
CREATE INDEX ON posts (owner);

For conditional queries, a conditional index can be very valuable - the primary benefit is seen when you have a known initial condition you will want to consistently query against.

SELECT * FROM posts WHERE owner = 'Bob' AND featured = true LIMIT 100;

could benefit from a conditional index if we will frequently query only featured posts from a given owner, with

CREATE INDEX ON posts (owner) WHERE featured = true;

In this case, the owner can be specified at runtime but we still receive a major benefit by indexing the fixed, known value of featured (true).

You can also measure the disk size of an index using:

SELECT pg_size_pretty(pg_total_relation_size('idx_name'));

It's important to understand how Postgres actually uses an index, and why it might choose to not use an index. During the planning phase, Postgres may decide that asking the index for the location of the queried rows may not be as performant as directly requesting those rows from the table:

When an index is used in a SELECT query, first the position of the requested rows is fetched from the index (instead of from the table directly). Then, the requested rows (a subset of the total) are actually fetched from the table. It’s a two step process. Given a large enough number of requested rows (such as all the posts with an id greater than 10), it is actually more efficient to go fetch all those rows from the table directly rather than to go through the extra step of asking the index for the position of those rows and next fetch those specific rows from the table. If the percentage of the rows is smaller than 5-10% of all rows in the table the benefit of using the information stored in the index outweighs the additional intermediate step.

~ Why Postgres Won't Always Use an Index - Thoughtbot

Query Process

Postgres prepares and runs queries in two distinct phases:

  • Planning phase
    • During planning, Postgres parses the query, analyzes it for optimizations, and then rewrites it for efficiency.
    • Timing optimizations should aim for planning phase to take 1 to 5 ms.
    • EXPLAIN presents a planning map and estimated execution metrics.
  • Execution
    • Timing varies widely for execution phase.
    • ANALYZE presents actual execution metrics.

Thoughtbot has a thorough blog post on Reading a Postgres EXPLAIN ANALYZE Query Plan. A quick digest:

Postgres builds a tree structure of plan nodes representing the different actions taken, with the root and each -> pointing to one of them. In some cases EXPLAIN ANALYZE provides additional execution statistics beyond the execution times and row counts, such as Sort and Hash above [any line other than the first without an ->].

Each tree’s branches represent sub-actions, and you’d work inside-out to determine what’s happening “first” (though the order of nodes at the same level could be different).

Because we ran EXPLAIN with the ANALYZE option, the query was actually executed and timing information was captured.

Peformance & Optimization

Cache hit rate and indexing are the first lines of defense for improving database performance.

Toggling \timing can be used to present execution time on any subsequent queries.

Well-constructed materialized views are especially helpful for reads, particularly when you have good candidates for caching partial results and further querying on those results (Thoughtbot).

Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces.

A view can be materialized, which means the results are stored by Postgres at CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW time. The cost of the partial query is paid at these times, so we can benefit from that over and over, especially in read-heavy situations (most situations are read-heavy in my experience).

You can even create indices on materialized view columns themselves, further speeding up reads with heavy joins.

Common table expressions or WITH clauses are useful for human readability and reusability, but they are an optimization fence, meaning they block the query optimizer within it.

Index scans are almost always preferred to sequential scans.

If your planning phase is taking up too much time, prepared statements work similarly to materialized views in that they cache the planning phase (parsing, analyzing, rewriting) of a query. However, they are not persisted between sessions.

Prepared statements have the largest performance advantage when a single session is being used to execute a large number of similar statements. The performance difference will be particularly significant if the statements are complex to plan or rewrite, for example, if the query involves a join of many tables or requires the application of several rules. If the statement is relatively simple to plan and rewrite but relatively expensive to execute, the performance advantage of prepared statements will be less noticeable.

If a given index covers an entire query, meaning all queried columns are contained within the index itself and therefore the query runner will not even need to access the table itself, this is called a covering index. Covering indices can be incredibly efficient. See Index Only Scan: Avoiding Table Access - UIL.

@praveen4463
Copy link

a multicolumn index (usually btree) is made of more than one column, if that is created using 3 columns, postgres may use it only when your query has condition on all 3 columns, or atleast include the leftmost column. read more here https://www.postgresql.org/docs/11/indexes-multicolumn.html

@elcontrastador
Copy link

Appreciate this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment