Timeline:
- Oct 3: Learning about this issue
- Tweet: https://x.com/andatki/status/1841915460062769187
- Reviewing commit: https://github.com/postgres/postgres/commit/62ddf7ee9a399e0b9624412fc482ed7365e38958
- Summarizing thoughts below
Goal: Understand WHY to ANALYZE ONLY
on a root partition table.
The current version (17) documentation seems to be wrong. It says only the root table is analyzed, but in the commit message David Rowley describes the partitions of the root table are also analyzed when ANALYZE runs on the root.
If you are using manual VACUUM or ANALYZE commands, don't forget that you need to run them on each child table individually. A command like: ANALYZE measurement;
After the commit, we see the docs now describe using ONLY
to get the behavior that was (incorrectly) documented before.
Manual VACUUM and ANALYZE commands will automatically process all inheritance child tables. If this is undesirable, you can use the ONLY keyword. A command like: ANALYZE ONLY measurement;
My question is: why ANALYZE ONLY
on the root partition table at all? The interesting statistics like samples and counts etc. are on the partitions of the root.
For example, even doing a count of rows mixed with non-partitioned and partitioned tables, the partitioned tables require special counting logic.
Need to do some experiments to understand what stats we need on the root table, and why to ANALYZE it all.
With that said, since Autovacuum is handling running ANALYZE
for us, evaluating thresholds and triggering a VACUUM (ANALYZE) part_table
for each partition as needed, for whatever reason there is to ANALYZE only the root, being able to isolate to the root by adding ONLY
makes sense, especially since that's what seems to have been previously documented although inaccurately.
Todo:
- Do some experiments, when TBD :)
Used tables from pgsql-hackers list / David Rowley examples:
Even though no statistics are directly collected on the root table, the planner uses the partition metadata to decide which partitions are relevant for the query.
17 docs:
https://www.postgresql.org/docs/current/sql-analyze.html
For partitioned tables,...
Thoughts: I still don't fully understand why a manual
ANALYZE/ANALYZE ONLY
on a partition root is needed.It sounds like it's needed for partition pruning to work for query planning, and perhaps not done automatically by Autovacuum which is running on partitions of a root, but not rolling up to the root.
Whatever ANALYZE does for a partition root, I'm not sure if there's a system view that makes the work it performed visible in a similar way to row level stats we'd see for partitions from pg_stat, pg_stat_user_tables, pg_stat_all_tables.