Analyzing ANALYZE

Timeline:

Oct 3: Learning about this issue
Tweet: https://x.com/andatki/status/1841915460062769187
Reviewing commit: https://github.com/postgres/postgres/commit/62ddf7ee9a399e0b9624412fc482ed7365e38958
Summarizing thoughts below

Goal: Understand WHY to ANALYZE ONLY on a root partition table.

The current version (17) documentation seems to be wrong. It says only the root table is analyzed, but in the commit message David Rowley describes the partitions of the root table are also analyzed when ANALYZE runs on the root.

Postgres 17 docs

If you are using manual VACUUM or ANALYZE commands, don't forget that you need to run them on each child table individually. A command like: ANALYZE measurement;

Postgres (devel) docs

After the commit, we see the docs now describe using ONLY to get the behavior that was (incorrectly) documented before.

Manual VACUUM and ANALYZE commands will automatically process all inheritance child tables. If this is undesirable, you can use the ONLY keyword. A command like: ANALYZE ONLY measurement;

Question

My question is: why ANALYZE ONLY on the root partition table at all? The interesting statistics like samples and counts etc. are on the partitions of the root.

For example, even doing a count of rows mixed with non-partitioned and partitioned tables, the partitioned tables require special counting logic.

Need to do some experiments to understand what stats we need on the root table, and why to ANALYZE it all.

With that said, since Autovacuum is handling running ANALYZE for us, evaluating thresholds and triggering a VACUUM (ANALYZE) part_table for each partition as needed, for whatever reason there is to ANALYZE only the root, being able to isolate to the root by adding ONLY makes sense, especially since that's what seems to have been previously documented although inaccurately.

Todo:

Do some experiments, when TBD :)

-- attrs "a" and "b" INSERT INTO only_parted VALUES (1, 'b'); INSERT INTO only_parted VALUES (1, 'c'); INSERT INTO only_parted VALUES (1, null); -- 25% of rows null, null_frac ANALYZE only_parted; -- propagates to only_parted1 partition -- Stats for only attr "b", but both part root=only_parted, and partition=only_parted1 SELECT * FROM pg_stats WHERE tablename IN ('only_parted', 'only_parted1') AND attname = 'b'; schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram ------------+--------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+---------------------- public | only_parted | b | t | 0.25 | 2 | -0.75 | | | {a,b,c} | 1 | | | public | only_parted1 | b | f | 0.25 | 2 | -0.75 | | | {a,b,c} | 1 | | | (2 rows)

andyatkinson/analyze_part_table.md

Postgres 17 docs

Postgres (devel) docs

Question

andyatkinson commented Oct 4, 2024