Skip to content

Instantly share code, notes, and snippets.

@jacobthemyth
Last active December 1, 2022 23:00
Show Gist options
  • Save jacobthemyth/300656cd55754906aa37dbca17cb08e4 to your computer and use it in GitHub Desktop.
Save jacobthemyth/300656cd55754906aa37dbca17cb08e4 to your computer and use it in GitHub Desktop.
Count how many times sets of columns are used to filter PostgreSQL queries via `pg_stat_statements`

These scripts can be used to figure out how often different sets of column combinations are used to filter queries in PostgreSQL. It's most useful for getting a high-level view in order to design and refactor indexes (e.g. if a certain set of columns is rarely used to filter queries, it's possible using index-merging instead of composite indexes would improve overall performance).

Usage

  • For each server in your database cluster, using credentials that can read pg_stat_statements.queryid (which is usually just superusers), download the contents of pg_stat_statements as CSV. This should be done separately for the primary and the replicas because pg_stat_statements tracks queries locally:
psql --csv -c 'SELECT * FROM pg_stat_statements' $DATABASE_URL > pg_stat_statements/$DATABASE_NAME.csv
  • Run process_pg_stat_statements to derive filter columns into a SQLite database:
./process_pg_stat_statements
  • Generate a CSV for the table you're interested in:
./column-usage-for-table octopuses
#!/usr/bin/env ruby
# frozen_string_literal: true
table = ARGV.fetch(0, "").chomp
if table.empty?
puts "Usage: column-usage-for-table <table>"
exit 1
end
require "bundler/inline"
require "csv"
gemfile do
source "https://rubygems.org"
gem "sqlite3"
gem "pry"
end
db = SQLite3::Database.new "pg_stat_statements.sqlite"
puts db.execute2(<<~SQL, table).map(&:to_csv).join
SELECT * FROM (
SELECT GROUP_CONCAT(c.relname || '.' || c.colname, ',') AS filter_columns, SUM(s.calls) AS call_count
FROM statement_filter_columns c
INNER JOIN pg_stat_statements s
ON s.queryid = c.queryid
AND s.server = c.server
WHERE c.relname = ?
GROUP BY c.server, c.queryid
)
GROUP BY 1
ORDER BY 2 DESC
SQL
#!/usr/bin/env ruby
# frozen_string_literal: true
require "bundler/inline"
require "csv"
gemfile do
source "https://rubygems.org"
gem "pg_query"
gem "pry"
gem "sqlite3"
end
require "json"
db = SQLite3::Database.new "pg_stat_statements.sqlite"
db.execute_batch <<-SQL
drop table if exists pg_stat_statements;
create table pg_stat_statements (
server string,
queryid integer,
query text,
calls integer,
total_time real,
min_time real,
max_time real,
mean_time real,
stddev_time real,
rows integer,
shared_blks_hit integer,
shared_blks_read integer,
shared_blks_dirtied integer,
shared_blks_written integer,
local_blks_hit integer,
local_blks_read integer,
local_blks_dirtied integer,
local_blks_written integer,
temp_blks_read integer,
temp_blks_written integer,
blk_read_time real,
blk_write_time real
);
drop table if exists statement_filter_columns;
create table statement_filter_columns (
server string,
relname text,
colname text,
queryid integer
);
SQL
Dir.glob("pg_stat_statements/*.csv").each do |csv_path|
server = File.basename(csv_path, ".csv")
CSV.foreach(csv_path, headers: true) do |row|
next if row["query"].nil?
db.execute <<~SQL, [server] + row.to_h.fetch_values("queryid", "query", "calls", "total_time", "min_time", "max_time", "mean_time", "stddev_time", "rows", "shared_blks_hit", "shared_blks_read", "shared_blks_dirtied", "shared_blks_written", "local_blks_hit", "local_blks_read", "local_blks_dirtied", "local_blks_written", "temp_blks_read", "temp_blks_written", "blk_read_time", "blk_write_time")
insert into pg_stat_statements (server, queryid, query, calls, total_time, min_time, max_time, mean_time, stddev_time, rows, shared_blks_hit, shared_blks_read, shared_blks_dirtied, shared_blks_written, local_blks_hit, local_blks_read, local_blks_dirtied, local_blks_written, temp_blks_read, temp_blks_written, blk_read_time, blk_write_time) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
SQL
parsed_query = PgQuery.parse(row["query"])
parsed_query.filter_columns.each do |pair|
relname, colname = pair
db.execute "insert into statement_filter_columns (server, relname, colname, queryid) values (?, ?, ?, ?)", server, relname, colname, row["queryid"]
end
rescue => e
puts "=> error"
p e
p row
end
end
db.execute "create index idx_pg_stat_statements on pg_stat_statements (queryid);"
db.execute "create index idx_statement_filter_columns on statement_filter_columns (queryid, relname, colname);"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment