hgmnz · August 5, 2009 12:32
diff --git a/gistfile1.txt b/gistfile1.txt
 # PostgreSQL Notes

 For first time Postgres users, the tutorial section on the documentation is a must read: http://www.postgresql.org/docs/8.4/interactive/tutorial.html

 ### Installation

 Easiest way is to use your package manager (yum, apt-get, etc). On MacOS, macports made it easy. Building the source is straightforward as well. There are plenty of resources online...

 ### Get acquainted with psql

 To enter the Postgres command prompt
    psql -U your_username -d the_database

 If there is a user in your database that matches your unix username, the -U parameter is optional (it will use your username). On development machines it is a good idea to create the matching postgres user. In that case, it is not needed to specify a username and password on database.yml.

 From within a rails app with a database.yml configured, script/dbconsole works as expected (drops you into psql).

 Within the psql command prompt you can list databases with \l, describe database objects using \d, connect to remote databases using \c, and many other useful tasks. Don't be shy about using \? for help, and \h <command> for help on a specific command. 

 A useful trick is to use \x before running a query that will return many columns. \x activates expanded mode. Try it.

 ### Rubygem/postgres adapter.

 There are many adapters in the wild, some native and some pure ruby. The one that is known to work for me both on Mac OS and on Linux is the following. In the dark windows days, easiest way is to use the pure ruby driver, although with some work the native driver can also be installed (involved moving DLLs from the postgres installation directory to the ruby bin directory, if I remember correctly).

    gem list --local |grep postgres
    postgres (0.7.9.2008.01.28)

 ### Create db_user role (user)

    sudo -u postgres createuser --no-superuser --createdb --no-createrole --login --pwprompt --encrypted -h 127.0.0.1 -p 5432 db_user

 ## Restore Database

 ### Create an empty database from template0.

 Via SQL (ie: in a psql prompt)

    create database db_name owner=db_user template=template0 encoding='utf8';

 Via createdb

    createdb -U postgres -O db_user -T template0 -E 'utf8' db_name

 ### Drop a DB

    dropdb -U postgres db_name

 ## Backup Database

 ### Dump Entire Database to Custom Format

    pg_dump -U postgres -Fc -v -f db_name-20081107.bak db_name

 Optional: -Z9 (maximum compression ratio)

 ### Dump users and people to Custom Format

    pg_dump -U postgres -t users -t people -Fc -v -f db_name-user_and_people-20081107.bak db_name

 ### Restore from custom format.

    pg_restore -U postgres -d db_name -v db_name.bak

 pg_restore also takes options for restoring specific objects, if the database dump uses the custom format (-Fc)

 ### Size of Database

    select pg_size_pretty( pg_database_size('db_name') );

 ### Size of a table and its indexes

    select pg_size_pretty( pg_total_relation_size('some_table') );

 ### Troubleshooting

 On Linux, if the server will not start, we can force the standard error to be redirected to standard output in order to troubleshoot properly:

    sudo -u postgres pg_ctl -D /var/lib/pgsql/data start

 This is particularly useful when tweaking the shared_buffers parameter. If postgresql doesn't start, usually shmmaxm must be increased, and the "request size" on the error log will tell you exactly what this kernel param must be set to.

 ### DBLink

 DBLink allows you to execute queries on a remote database. It is part of the contrib modules.

 #### DBLink Installation
 First, make sure you've installed the postgresql-contrib package. This includes dblink.sql which is installed by doing:

    psql db_name -U postgres < /usr/share/pgsql/contrib/dblink.sql

 #### Querying via DBLink

 Unfortunately, all column types of a query must be specified. For example:

    select users.*
    from dblink('dbname=db_name host=<remote_ip_addr> password=<remote_password> user=<remote_user>', 'select * from users') 
    as users(id integer, login varchar(255), email varchar(255), 
      crypted_password varchar(40), salt varchar(40), created_at timestamp, 
      updated_at timestamp, remember_token varchar(255), 
      remember_token_expires_at timestamp, person_id integer, 
      user_status_id integer);

 #### Moving data from a remote server to the local DB
 This is as simple as an insert ... from ..., only the from portion comes from a remote server using dblink:

    insert into users select remote_users.* 
    from dblink('dbname=db_name host=<remote_ip_address> password=<remote_password> user=<remote_user>', 'select * from users') 
      as remote_users(id integer, login varchar(255), email varchar(255), 
      crypted_password varchar(40), salt varchar(40), created_at timestamp, 
      updated_at timestamp, remember_token varchar(255), remember_token_expires_at timestamp, 
      person_id integer, user_status_id integer);

 ### Basic Server Setup on Linux

 This section is mostly useful for setting up a server for production use:

 Set password

 Change the PostgreSQL postgres user password, we will not be able to access the server otherwise. As the "postgres" Linux user, execute the psql command:

    sudo -u postgres psql template1

 At the new prompt, change the password

    ALTER USER postgres WITH PASSWORD 'new_password';
    \q

 ### Update postgresql.conf File

 Set the listen_address option to '*' if this server will be accessed from other hosts.

 ### Performance tuning

 Increase "System V IPC" shared memory min and max on linux kernel:
 See the following:
 http://www.postgresql.org/docs/current/static/kernel-resources.html

 As a rule of thumb, set the shmmax value to 1/4 of the available memory on the system, in bytes. If this is a dedicated postgres server, set to half the RAM.

 For example, server has 4GB of RAM:

 4GB/4 = 1GB = 1024MB = 1048576 KB = 1073741824 B
    sudo /sbin/sysctl -w kernel.shmmax=1073741824
    sudo /sbin/sysctl -w kernel.shmall=2097152

 edit the following file, and add the two parameters above (kernel.shm*=*)
 /etc/sysctl.conf

 To check what these values are set to:

    cat /proc/sys/kernel/shmmax
    cat /proc/sys/kernel/shmall

 For shared_buffers, start with 1/4 of available RAM. cache_miss statistics can tell if you need more
 more buffers needed for many concurrent queries and many CPUs.
    shared_buffers: 1/4 RAM.

 RAM available for queries:
    effective_cache_size: set it to 2/3 the RAM on the server (a normal conservative setting).

 The less max_connections, the better (each connection allocates a chunk of RAM). Think of the need for this server:
    max_connections = 32 

 Turns of WAL logging. Risky, but improves performance (don't do this until we've thoroughly tested) - if server crashes (unlikely), there will be no log available to restore latest transactions.
    fsync=off #trying this on a staging server for now.

 Write less to the WAL
    checkpoint_segments = 64 (16 = checkpoint every 512MB. For bulk loading or DW, settings of 64/1GB or over are common)
    checkpoint_completion_target = 0.90 (aim to finish writing WAL by the time 90% of the next checkpoint is ready. Note that this only works for postgresql version >= 8.3) 
    checkpoint_timeout = defaults to 5min, can be increased too (I've used 25min successfuly)

 How much statistics to collect, helps query planner to use indexes.
    default_statistics_target = from 200 - 1000 for large databases

 This setting is used almost exclusively by the vacuum process, but also for create index and adding foreign keys.
 you can additionally use log_temp_files to figure out if sorts are using disk instead of fitting in memory.
 This setting can be set at load time, like work_mem
 Larger values are common while performing bulk updates or DB restores.
    maintenance_work_mem = 1/4 of your RAM # for version >= 8.3
    maintenance_work_mem = 1/8 of your RAM # for version < 8.3


 Default work_mem value is a whimpy 1MB. Tells postgres how much memory is available before creating temp files during sorts
 for queries that require significant sorting and hash operations. 
 You may set this value on the fly, for example:
 <code>
  CONNECT
  set work_mem = 1024MB
  select distinct * from ... order by ...
  DISCONNECT
 </code>

 In general, OS swapping == too much work_mem, while caching sorts in pg_temp == not enough work_mem.
    work_mem = 32MB 

 Which method to use for commits to disk of WAL. This is not conservative, but popular. 
    wal_sync = open_sync

 How much memory used in shared memory for WAL data:
    wal_buffers = 32MB # do this for SMP systems (and otherwise) and DW with 4GB on server. OLTP => 16MB

 this setting is all about how expensive it is to for your disks to seek a random disk page. Lower it to 2.5 to encourage the query optimizer to use index scans. The default of 4.0 is too conservative for modern hardware.
    random_page_cost = 2.5

 This effectively double checks data log writes. Turn off if you have reliable hardware and filesystem
    full_page_writes = false/off

 Support "batch commits", for applications with many concurrent writers

    commit_delay = 250 # I haven't tried this. 
    deadlock_timeout = 2000

 Minimize logging on production servers

    log_error_verbosity = terse #writes lss than default
    log_min_messages = warning for dev servers, error for prod servers.
    log_statement = 'none' #don't log queries on prod servers
    syslog_facility = 'LOCAL2'
    syslog_ident = 'postgres'
    log_connections = off
    log_disconnections = off
    log_duration = off
	# PostgreSQL Notes

	For first time Postgres users, the tutorial section on the documentation is a must read: http://www.postgresql.org/docs/8.4/interactive/tutorial.html

	### Installation

	Easiest way is to use your package manager (yum, apt-get, etc). On MacOS, macports made it easy. Building the source is straightforward as well. There are plenty of resources online...

	### Get acquainted with psql

	To enter the Postgres command prompt
	psql -U your_username -d the_database

	If there is a user in your database that matches your unix username, the -U parameter is optional (it will use your username). On development machines it is a good idea to create the matching postgres user. In that case, it is not needed to specify a username and password on database.yml.

	From within a rails app with a database.yml configured, script/dbconsole works as expected (drops you into psql).

	Within the psql command prompt you can list databases with \l, describe database objects using \d, connect to remote databases using \c, and many other useful tasks. Don't be shy about using \? for help, and \h <command> for help on a specific command.

	A useful trick is to use \x before running a query that will return many columns. \x activates expanded mode. Try it.

	### Rubygem/postgres adapter.

	There are many adapters in the wild, some native and some pure ruby. The one that is known to work for me both on Mac OS and on Linux is the following. In the dark windows days, easiest way is to use the pure ruby driver, although with some work the native driver can also be installed (involved moving DLLs from the postgres installation directory to the ruby bin directory, if I remember correctly).

	gem list --local \|grep postgres
	postgres (0.7.9.2008.01.28)

	### Create db_user role (user)

	sudo -u postgres createuser --no-superuser --createdb --no-createrole --login --pwprompt --encrypted -h 127.0.0.1 -p 5432 db_user

	## Restore Database

	### Create an empty database from template0.

	Via SQL (ie: in a psql prompt)

	create database db_name owner=db_user template=template0 encoding='utf8';

	Via createdb

	createdb -U postgres -O db_user -T template0 -E 'utf8' db_name

	### Drop a DB

	dropdb -U postgres db_name

	## Backup Database

	### Dump Entire Database to Custom Format

	pg_dump -U postgres -Fc -v -f db_name-20081107.bak db_name

	Optional: -Z9 (maximum compression ratio)

	### Dump users and people to Custom Format

	pg_dump -U postgres -t users -t people -Fc -v -f db_name-user_and_people-20081107.bak db_name

	### Restore from custom format.

	pg_restore -U postgres -d db_name -v db_name.bak

	pg_restore also takes options for restoring specific objects, if the database dump uses the custom format (-Fc)

	### Size of Database

	select pg_size_pretty( pg_database_size('db_name') );

	### Size of a table and its indexes

	select pg_size_pretty( pg_total_relation_size('some_table') );

	### Troubleshooting

	On Linux, if the server will not start, we can force the standard error to be redirected to standard output in order to troubleshoot properly:

	sudo -u postgres pg_ctl -D /var/lib/pgsql/data start

	This is particularly useful when tweaking the shared_buffers parameter. If postgresql doesn't start, usually shmmaxm must be increased, and the "request size" on the error log will tell you exactly what this kernel param must be set to.

	### DBLink

	DBLink allows you to execute queries on a remote database. It is part of the contrib modules.

	#### DBLink Installation
	First, make sure you've installed the postgresql-contrib package. This includes dblink.sql which is installed by doing:

	psql db_name -U postgres < /usr/share/pgsql/contrib/dblink.sql

	#### Querying via DBLink

	Unfortunately, all column types of a query must be specified. For example:

	select users.*
	from dblink('dbname=db_name host=<remote_ip_addr> password=<remote_password> user=<remote_user>', 'select * from users')
	as users(id integer, login varchar(255), email varchar(255),
	crypted_password varchar(40), salt varchar(40), created_at timestamp,
	updated_at timestamp, remember_token varchar(255),
	remember_token_expires_at timestamp, person_id integer,
	user_status_id integer);

	#### Moving data from a remote server to the local DB
	This is as simple as an insert ... from ..., only the from portion comes from a remote server using dblink:

	insert into users select remote_users.*
	from dblink('dbname=db_name host=<remote_ip_address> password=<remote_password> user=<remote_user>', 'select * from users')
	as remote_users(id integer, login varchar(255), email varchar(255),
	crypted_password varchar(40), salt varchar(40), created_at timestamp,
	updated_at timestamp, remember_token varchar(255), remember_token_expires_at timestamp,
	person_id integer, user_status_id integer);

	### Basic Server Setup on Linux

	This section is mostly useful for setting up a server for production use:

	Set password

	Change the PostgreSQL postgres user password, we will not be able to access the server otherwise. As the "postgres" Linux user, execute the psql command:

	sudo -u postgres psql template1

	At the new prompt, change the password

	ALTER USER postgres WITH PASSWORD 'new_password';
	\q

	### Update postgresql.conf File

	Set the listen_address option to '*' if this server will be accessed from other hosts.

	### Performance tuning

	Increase "System V IPC" shared memory min and max on linux kernel:
	See the following:
	http://www.postgresql.org/docs/current/static/kernel-resources.html

	As a rule of thumb, set the shmmax value to 1/4 of the available memory on the system, in bytes. If this is a dedicated postgres server, set to half the RAM.

	For example, server has 4GB of RAM:

	4GB/4 = 1GB = 1024MB = 1048576 KB = 1073741824 B
	sudo /sbin/sysctl -w kernel.shmmax=1073741824
	sudo /sbin/sysctl -w kernel.shmall=2097152

	edit the following file, and add the two parameters above (kernel.shm=)
	/etc/sysctl.conf

	To check what these values are set to:

	cat /proc/sys/kernel/shmmax
	cat /proc/sys/kernel/shmall

	For shared_buffers, start with 1/4 of available RAM. cache_miss statistics can tell if you need more
	more buffers needed for many concurrent queries and many CPUs.
	shared_buffers: 1/4 RAM.

	RAM available for queries:
	effective_cache_size: set it to 2/3 the RAM on the server (a normal conservative setting).

	The less max_connections, the better (each connection allocates a chunk of RAM). Think of the need for this server:
	max_connections = 32

	Turns of WAL logging. Risky, but improves performance (don't do this until we've thoroughly tested) - if server crashes (unlikely), there will be no log available to restore latest transactions.
	fsync=off #trying this on a staging server for now.

	Write less to the WAL
	checkpoint_segments = 64 (16 = checkpoint every 512MB. For bulk loading or DW, settings of 64/1GB or over are common)
	checkpoint_completion_target = 0.90 (aim to finish writing WAL by the time 90% of the next checkpoint is ready. Note that this only works for postgresql version >= 8.3)
	checkpoint_timeout = defaults to 5min, can be increased too (I've used 25min successfuly)

	How much statistics to collect, helps query planner to use indexes.
	default_statistics_target = from 200 - 1000 for large databases

	This setting is used almost exclusively by the vacuum process, but also for create index and adding foreign keys.
	you can additionally use log_temp_files to figure out if sorts are using disk instead of fitting in memory.
	This setting can be set at load time, like work_mem
	Larger values are common while performing bulk updates or DB restores.
	maintenance_work_mem = 1/4 of your RAM # for version >= 8.3
	maintenance_work_mem = 1/8 of your RAM # for version < 8.3


	Default work_mem value is a whimpy 1MB. Tells postgres how much memory is available before creating temp files during sorts
	for queries that require significant sorting and hash operations.
	You may set this value on the fly, for example:
	<code>
	CONNECT
	set work_mem = 1024MB
	select distinct * from ... order by ...
	DISCONNECT
	</code>

	In general, OS swapping == too much work_mem, while caching sorts in pg_temp == not enough work_mem.
	work_mem = 32MB

	Which method to use for commits to disk of WAL. This is not conservative, but popular.
	wal_sync = open_sync

	How much memory used in shared memory for WAL data:
	wal_buffers = 32MB # do this for SMP systems (and otherwise) and DW with 4GB on server. OLTP => 16MB

	this setting is all about how expensive it is to for your disks to seek a random disk page. Lower it to 2.5 to encourage the query optimizer to use index scans. The default of 4.0 is too conservative for modern hardware.
	random_page_cost = 2.5

	This effectively double checks data log writes. Turn off if you have reliable hardware and filesystem
	full_page_writes = false/off

	Support "batch commits", for applications with many concurrent writers

	commit_delay = 250 # I haven't tried this.
	deadlock_timeout = 2000

	Minimize logging on production servers

	log_error_verbosity = terse #writes lss than default
	log_min_messages = warning for dev servers, error for prod servers.
	log_statement = 'none' #don't log queries on prod servers
	syslog_facility = 'LOCAL2'
	syslog_ident = 'postgres'
	log_connections = off
	log_disconnections = off
	log_duration = off