max_connections
- set to useful peak
- above 200, look at conn pooler
- consider
superuser_reserved_connections
shared_buffers
- shared RAM
- amt of memory pg takes for itself
- generally, avail ram ÷ 4
- lower for writ heavy, higher for read heavy
- larger than 8gb not useful. extension that shows what happens in buffer cache
- can go over 8 G if lots of tables, lots of indexes. depends on workload
- over 8 G not harmful
work_mem
- non-shared RAM, used by query operation, incluidng sorts and hashes before disk
- if over limit, just writes to disk
- complex queries can have multiple operations
- set to
- avail RAM ÷ max connections
- avail ram ÷ 2 x max conns
- highly dependent on workload
- 1-8 M for lots of conns with simple workloads
- 64-128 M for fewer conns and or complex workloads
- must be power of 2
maintenance_work_mem
- similar to work mem, but applies to index builds, vacuum, autovacuum
- set to avail RAM ÷ 8
- larger than 2 G not useful
- can temporarily set higher to speed index creation
- can be set per-session
autovacuum_work_mem
- default to main work mem
- set to avail RAM ÷ 16
- larger than 1G not useful
- must be power of 2
effective_cache_size
- estimate of how much RAM db has access to directrly and indirectly to potntailly cached data
- doesnt' allocate, just a hint
- if only db on server, set to physical RAM
random_page_cost
- hint to how fast data can be read from disk
- value of 1 is equiv to reading from RAM
- SSD = 1.2, SAN = 1.2, HDD = 4
wal_level
- set to lowest val required
- setting to higher means more data written to disk
wal_buffers
- amt of shared mem used for WAL data that has not yet been written to disk
- deafult of -1 bases size on shared buffers, default is often fine
- set to 128 M on servers with more than 8 cores with high workload
commit_delay
- how long to wait in MS before a WAL flush is initiated
- on servers with high write load, this can help improve perf by grouping write sof multiple txns together
- if commit delay reach limit it flushes, if
wal_buffers
hits limit, it flushes - on servers with low write, this can impede performance
- default is no delay
- leave at default
min_wal_size
- how much to alocate to store WAL files
- default is 80 M - generally too small, set to between 1-8G
- otheriwse needs to continually grow WAL file
max_wal_size
- soft limit
- must be large enough for cehckpoint
- default is 1G
- setting too hight can increase time after crahs
- can be set to 2 x
min_wal_size
checkpoing_timeout
- max time between automated checkkpoints
- default is 5 mins
- useful range is 30mins - 1 hr
- replay is about 20% of this time after a crash
- setting too low will force checkpoints often, wranings
- setting to high will increase crash recover time
- for long ETL batches incresae to be longer than query run time
checkpoint_completion_target
- amt of time before timeout as afractin to attempt to finish writing data out to disk
- defaul tis 0.5 - 50% of
checkpoint_timeout
- useful values are 0.8 - 0.9
- high values can cuase locks if thlots of data tobe rwirtten near end of checkpoint
- low valus cn adegrade performance
checkpoint_warning
max_worker_processes
- max concur worker proceses
- set to no more than number of cores
max_parallel_workers_per_gather
- set to 4
- max num of parallel procese per executor
max_parallel_workers
- max num of parallel workesr running at any point
- set to number of cores
- only from pg10
effective_io_concurrency
- gives query planner hint to number of I/O requests storage system can effectively handel at same time
- hHDD should be low
- SAN should b higher
- SSD should be in hundreds
- similar to
random_page_cost
default_statistics_target
-
during
ANALYZE
operation, how much sampel takne -
default is 100, can be set to 1000, means analyzer takes longer to run
-
default can be too low for larger data sets
-
setting to high slows
ANALYZE
down -
if misanalying table wrong, this can help
-
can analyze temp table, may be helpful depending on size - diff than
EXPLAIN ANALYZE
-
analyze runs periodically by worker, used by query planner
fsync
- always leave on, never turn off
full_page_writes
- leave on
synchronous_commit
- always leave on
- set to off if you don't mind data loss from WAL
quey planner config
- just leave them all on, e.g.
enable_bitmapscan
- can be tuned per session
idle_in_transaction_timeout
- left open without calling
COMMIT
orROLLBACK
- prevent vacuum and maintenance, index creation
- default disabled
- maybe only pg10
- if gets into this state, may need to abandon connection or issue manual rollback
statement_timeout
- default disabled
- only set in current session, not cluster-wide
temp_file_limit
- let bad query from using too much RAM, exceed
work_mem
and start writing to disk, e.g. for big JOIN - off by default
- reclaim space by dead tuples
- as rows updated/delete,d marked dead
- MVCC means txns might still have view of them
VACUUM
job is to flag thoe blocks for reuse anc reclaim- VACUUM only release disk if entire file 1G is empty, otherwise just mark it reusable
- cause lots of I/O requests
- vacuum full copies all data into new files, reclaims disk from dead uples, at expsen of locking table while doing so
- VACUUM FULL requres as much disk as current files, plsu actual data size for new files
pg_repack
does something similar, bit without table lock, only smaller lock aight at end of operation, takes a bit longer.- only really useful if you need to reclaim space. if table grows and shrinks, maybe just tweak analyze targets on it
- tables may grow quickly at the beginning, then taper off as inserts start reusingn unused space.
- after VACUUM FULL table will often appear to be growing more quickly
- VACUUM ANALYZE runs normal vacuum followed by an ANALYZE; ANALYZE can be run on its own. if doing manually, best to just do VACUUM ANALYZE
- VACUUM ANALYZE VERBOSE - shows where stuck
- VACUUM updates FSM per table. FSM used when writing data
- AUTOVACUUM runs in bg, auto on tables that need it
- by default overly conservative with locks held - tries to avoid locks that would impact performance
- can be speicied cluster wide, or per table
aotvacuum_threshold
and autovacuum_scale_factor
- min value of rows that need to have changed
- don't set to hight, tables get excluded
- don't set too low, tables get vacuum often
- can set on per-table basis for larger values scale factory:
- large 0.001, medium: 0.1, small: 0.2
- if majority tables are small, set cluster-wide settings for small tables, per-table table level for bigger tables (and vice versa)
- always remember to check vbalues for new tables, and as tables grow
- keep checking formula to see how oftenit will run
- formula:
threshold + (scale_factor x num tuples)
- tuning
fill_factor
for bulk importsautovacuum_max_workers
- default is 3, way too low
- consider number of tables actively updated and deleted
- takes from
superuser_reserve_connections
- too low means autovacuum never keeps up
- too hight means additional I/O overhead
autovacuum_naptime
- time to wait befor eautovacum launche rcehcek if it needs a process
- default is okay
- rather twaeak threshold and scale factor
autovacuum_cost_limit
- keep track of accumulated cost of running vacuum
- if exceeds limit, will wait
- default is fine
- hard to set good value
autovacuum_analyze_threshold
- may want this to urn more often, make sure less than vacuum threshold so it runs more often
autovacuum_cost_delay
- if cost exceeded, how long to sleep
- default is okay
- can be set per table
pg_stat_all_tables
pg_stat_user_tables
- shows when last manual or vacuum table ran
- as well as last manual or automatic analyze ran
pg_stat_progress_vacuum
- show stats on each vacuum process currently running, only pg 9.6
two modes query planners
- regular query planner (12 or fewer joins), does exhaustive search
- generic query optimizer (> 12 joins)
- heuristics
geqo_*
parameters
alternates
- COPY instead of INSERT
- TRUNCATE intsead of DLETE
- CURSOR intesad of LIMIT/OFFSET
- INTERSECT instead of AND
- UNION instead of OR
pg_tgrm
instead of LIKE
- vacuum slow? could be TOAST
- max block (therefore row) size is 8kb
- how to insert data larger? compressed and/or broken into 2kb chunks
- original table has OID entry to TOAST table
- slow to access, select * could be slower if TOAST entries
- PLAIN no TOASTE or compression
- MAIN - users compression discourages use of TOAST
- EXTERNAL - users TOAST bt no compression - can be preferable if doing substring ops/queries
- EXTENDED - compress first then uses TOAST
- default is EXTENDED for most toast-able columns
- to change use
ALTER TABLE ... SET STORAGE
- lots of updates or deltes, no vacuum, or bacuum has run bt no new rows yet
- can be inefficient to scan over lots of empty blocks
- try not to exceed 400 partitions
- only write tables need to be vacuum, old tables don't
- every query needs to reference key that determines partition, otherwise just scans everything
- pg 11 starting to do auto partitioning, pg 12 fully automatic partitioning
- unique indexes can have where clause, unique constraints cannot
- don't create both, functionally identical
- uniqe constraints have better error messages than unique indexes
- if overlapping single-col and multi-col index, pg will use single-col just because it's faster, even it perfor is the same
pg_stat_user_indexes
look atidx_scan
,idx_tup_rad
andidx_tup_fetch
- try to combinae indexes where possible:w
- frequently delete most but not all index entries causes bloat
- may need REINDEX - if you delete large part of your table
- REINDEX cannot be done concurrently, use CREATE INDEX, DROP INDEX, ALTER INDEX
- before 9.3 used sys V shared memory, after 9.3 POSIX
kernel.shmmax
- max of Sys V shared mem, less important to set after 9.3
sysctl kerne.schmmax
should be about the amount of RAM you have- if too low, set and put into
/etc/sysctl.d/postgresql.conf
kernel.shmall
total num 4KB shared memory pages, again less critical since 9.3- divid schmmax by 4
- if using systemd,
RemoveIPC
, shared memory and semaphores are removed when use rfully logs out, probaby only prob if you installed from source - resource limits, num processes, open files per user, per process amount RAM per process
cat /proc/$(pgrep -o postgres0/limits
- set in
/etc/security/limits.conf
- one process per connection, also processes for WAL worker, autovacuum
max_files_per_process
- pooloing - can't do listen/notify with pooler
- default virtual memory not optimal for pg
- kernel default memory overcommit setting canc ause system to run out virtual RAM and terminate the postmaster process with OOM
- if pg only thing on server, not such a prob -if total RAM low, increase swap
- OOM killer only invoked when phuysical memory and swap exahusted
- lower config params like
shared_buffers
andwork_mem
- lower number oc concurrent connections (
mx_connections)
and use a conn pool - on linux 2.6 can pevent overcommit, which lowers risk of OOM killer
- set
overcommitmemory=2
- on systeme without swap, setting
vm.overcommit_memory
to 2 will probably cause issues - can also tune
overcommit_ratio
as percentagee of RAM avail or overcommit - default value here is fine
- if wrangling OOM killer, only want to prevent OOM killer running on master process, not children
- huge pages can improve perf if supported in your kernel -
huge_pages=try
- to estimate, look at VmPEak value for postmaster and system huge pages ize
head -1 $PGDATA/postmaster.pid
grep ^VmPeak /$pid
grep &HugePagesize /proc/meminfo
XXX
sysctl -w vm.nrhugepages = XXX
- don't use 0 swappiness, set swappiness to between 5 and 20, lots of RAM higher, 10 is fine. same in virtualized environment
vm.dirty_background_ration
is ratio percent of mem filled with dirty pages need to be flushed- range from 0 1--
- values lower than 5 may not work
- deafult is 10 = 1.6 G on server with 16 G RAM
- write intensive ops may see improvement wih lower ratio so that linux flushes in background
- no big diff on read heavy workload
- also foreground version of these configs, but flusing done in foreground, therefore blocks application
- in general foreground should be higher than bg, want to encourage bg flushing vs fg
vm.dirty_expire_cenrtisecs
setting to high can cause longer IO poauses latersysctl -w vm.dirty_background_ration = vm.dirty_ration=10
good place to start- linux has diff IO schedulers, defaul it CFQ generally fine, can also try Deadline, may be worse for interactive workloads
- EXT fs mount options:
relatime
vsnoatime
- later means no need to track access data=ordered
vsdata=writeback
latter offers better perf at risk of data corruption. writeback writes journal entry first, then data can cause corruption in case of crash. DON'T USE ITbarrier=0
if using RAID with battery backup, risk data corruption
- put WAL on separate HDD, can be spindle disk
- journaeled fs for WAL
- use tablespaces to tier storage
- smaller tables will be cached
- larger tables accessed randomly benefit from faster drives
- CPU - cores matter, speed is slightly less important, high IO will cause CPU wait time
- parallel processing >= pg 9.6
stats_temp_directory
- does not need to be persisted, can be a RAM disk, s mount new partition, set
stats_temp_directory
in conf
- stats collecter
- counts accesses to tables, to indexes, total (estiamted) numbr of rows
- vacuum and analyze actions on tables
- counts calls to user-defined functions and timing
- access from
pg_stats_*
views track_activities
should be on, by defaulttrack_actrivity_query_size
max length for tracking a command, default 1024, increase if long queriestrack_counts
leave ontrack_io_timing
minotor block read write times, affets perftrack_functions
may affect performancepg_stat_activity
dynamic view of commands currently runningpg_stat_database
- commit, rollback counts, num deadlocks, can show hit ratio for buffer cachepg_stat_user_tables
seq scans, rows returned, index scans, rows inserted,pg_stat_user_indexes
re indexes being usedpg_statio_user_tables
raw details on IO if you turned on IO trackingpg_statio_user_indexes
ditto, disk blocks read from indexpg_statio_user_sequences
dittopg_stat_user_functions
pg_stat_reset
get learn view after you make changespg_stat_bgwriter
shows num checkpoints run, forced checkpoints (want these to be low as possible wrt normal checkpoints)pg_stat_statements
execution stats, num calls, total, min, amx, total reows retrieved; blocks hit read dirtied written; merges similar queries- some overhead but probably worth it
pgstattuple
siae of table; num live tuples; total size live tuples; num dead tuples; total free space;- very little perf impact
pg_buffercache
what is happening in shared buffer cachepg_prewarm
load data into eithe rOS or pg sql buffer cache at startuppg_view
basically queriespg_stat_activity
check_postgres
nagios plugin- cna be done without nagios too
pgbagder
- analyzes log files- similar to
pg_stat_statements
help find queries that should have index
- similar to
logging_collector
should always be onlog_rotation_age
log_rotation_size
log_statement
- off, ddl, mod, alllog_lock_waits
when longer thandeadlock_timeout
- pgconfigurator.cybertec.at
- pgtune.lepoard.in.ua
- github.com/jbkerkus/annotated.conf