postgresql-replication.md

The following guide was copied from our private internal documentation, but there's nothing private in it, we just don't have anywhere to publish documentation like this. By the time you read it, it will likely be out of date and may not reflect our current practice. Some references to our internal configuration management remain; you'll have to figure those bits out for yourself.

Provisioning of PostgreSQL replicas is complicated. It's more complicated still if we want to use secure authentication, which is why we haven't done it before (despite the best of intentions). This guide will follow my (wollman's) process in setting this up for the first time, hopefully with all the dead ends removed.

There are several different ways of doing replication in PostgreSQL. The official documentation describes all of them. Ignore the official documentation, it will only confuse you. (Trust me.)

Review of installation details

We are currently running PostgreSQL 16, which is installed on servers using our custom version of the postgresql16-server package. We override some parameters in the package build for compatibility with older PostgreSQL releases; notably, we use the user pgsql (not postgres) and UID/GID 70. The pgsql user's home directory is /usr/local/pgsql, and the database cluster is installed in versioned data directories, currently /usr/local/pgsql/data16. We use the acmetool package to automatically provision Let's Encrypt certificates on the servers; since the database servers do not run web servers, the "development" configuration can be used to run a built-in web server only when acmetool is actively trying to obtain a new certificate. (It should be practical to configure acmetool from Puppet to automatically provision certificates in this configuration, but I haven't figured that out yet. Maybe I'll switch to something else if I can find another implementation that doesn't barf Python stack dumps on errors.)

We use the postgresql Puppet module to automate much of the configuration, but we manually install our own pg_hba.conf and pg_ident.conf files because it's clearer to just write what we mean according to the PostgreSQL documentation rather than trying to trick the Puppet module into doing it. The csail::postgresql class handles installing the files in the appropriate data directory, with configuration in the cluster hierarchy.

We will be using "replication slots" below, which are a built-in mechanism in PostgreSQL to keep track of which write-ahead logs have been transferred to each replica, so they are not deleted prematurely. Sufficient slots (the number of replicas plus one for provisioning) must be configured on the server, but the default configuration has 10 slots which should be more than enough for our purposes. (The server configuration can be modified on-line using the ALTER SYSTEM superuser command in SQL.)

Initial provisioning

(If you are upgrading an existing replica, rather than standing up a new replica, then all this has been done already and you should skip to the "Upgrading" section below.)

This document assumes the primary (read-write) server has already been set up and is functioning normally, and that it is configured with TLS and GSSAPI. The replicas will each need their own Kerberos keytabs, for their distinct postgres/CANONICAL-DNS-NAME.DOM.AIN princpals. These can be created in the normal way with kadmin and copied to somewhere safe on the server, but note that pg_basebackup will copy the keytab file from the primary server, which is not the right credential for the replicas. The way we are doing this, the replicas use their server credentials as GSSAPI initiator credentials, so they should ultimately be using the same keytab file in both roles.

In order to make this work, the primary server must be configured to allow replication connections from the replicas. These connections are used both in initial provisioning of each replica, and also operationally for the replicas to stream updates from the primary server, so the primary has to be properly configured first. This requires three steps:

creating a database user with the REPLICATION privilege
mapping the replicas' Kerberos principals to the new database user
configuring the primary server's access list to allow replication connections as this user from the replicas

Creating the database user

Since users are global to a PostgreSQL cluster it doesn't actually matter which database you connect to, and this can obviously be done on the primary rather than over the network. I assume that the user running this has been granted access to a database role that has SUPERUSER privileges (in our case, pgsql).

$ psql -h PRIMARY template1
template1=> SET ROLE pgsql;
SET
template1=# CREATE USER "replication" WITH REPLICATION;
CREATE USER
template1=#

By convention, on our servers only users in the password_users group can connect to the server using password authentication, so this new replication user can only connect with local (UID-based) authentication, or by being mapped from another identify, which we'll set up in the next step.

If your database user doesn't have superuser privileges, you can just run psql locally as the pgsql user, and skip the SET ROLE command.

Setting up the user mapping

All of the PostgreSQL service principals are named postgresql/CANONICAL-DNS-NAME.DOM.AIN. The pg_ident.conf file (configured in Puppet by csail::postgresql::pg_ident_conf, which points to a file served from modules/csail/files/postgresql) configures the mapping from principal names ot PostgreSQL users. The file consists of multiple maps, each a series of lines containing whitespace-separated fields. The first field is the name of the map (normally we just use csail); the second field is either a string or a regular expression that is matched against the Kerberos principal; and the third field is the name of the PostgreSQL user (possibly substituting captured subexpressions from the second column). If the second field starts with a /, the remaining characters are treated as an Extended Regular Expression; otherwise it is an exact string match. The map is evaluated in order, first match wins.

For replication, add lines to the map for the service principal name of each replica, thusly:

csail           postgres/CANONICAL-DNS-NAME.DOM.AIN     replication

Configuring the primary server ACL

PostgreSQL keeps its server authentication configuration in a file called pg_hba.conf; like pg_ident.conf we provision this from Puppet using the csail::postgresql module. Currently this file is not generated automatically, but with exported resources it should be possible to automate this step of the provisioning based on the cluster fact. PostgreSQL replication uses a different protocol from regular client access, but connects on the same port with the same options for access control; access controls for the replication protocol are represented by access rules that specify replication in place of a database name. The database wildcard all does not match replication connections, so they must be explicitly permitted even in the usual case where regular client access is wildcarded. It is best to make the access rules for replication to be as specific as possible, limited to just the one specific user and only the IP addresses used by the replicas. (It's probably a good idea to also include the primary server's addresses in the ACL, because the same pg_hba.conf file will be installed across the Puppet cluster.) Example:

hostssl replication             replication             192.0.2.123/32          gss krb_realm=CSAIL.MIT.EDU map=csail include_realm=0
hostssl replication             replication             2001:db8::1234:5678/128 gss krb_realm=CSAIL.MIT.EDU map=csail include_realm=0

After the Puppet agent changes pg_hba.conf it will be necessary to restart PostgreSQL on the primary server; Puppet knows how to do a server reload, which is normally what you want, but this is insufficient for changes to pg_hba.conf.

Setting up the replica as a standalone server

Standard PostgreSQL replication works on the level of the entire "cluster" (in the PostgreSQL sense of the word: all of the databases on the server). The pg_basebackup utility will copy all of the regular files from the data directory of the primary, which includes some inappropriate items like keytabs and TLS keys. Therefore, it's best to configure and test the replica with the configuration our Puppet class expects before starting replication, to make sure that any provisioning issues are worked out and that GSSAPI and TLS server-mode authentication are both functioning properly.

To set up TLS certificates with acmetool, you will need to first install an acmetool hook script, which installs a copy of the private key readable by the PostgreSQL server process. This isn't automated yet. Install the following script in /usr/local/libexec/acme/hooks/postgresql, substituting the appropriate DNS name into the path in line 5:

#!/bin/sh -e
EVENT_NAME="$1"
[ "$EVENT_NAME" = "live-updated" ] || exit 42
# Update the key in case it was rotated
install -c -o pgsql -g pgsql -m 400 /var/db/acme/live/CANONICAL-DNS-NAME.DOM.AIN/privkey /usr/local/pgsql/data16/server.key
# Signal the server
service postgresql quietrestart 2>/dev/null >/dev/null

and chmod +x so that acmetool will restart PostgreSQL every time the certificate changes.

This only handles copying the private key; the certificates, being public, need to be symlinked into place in the data directory:

# ln -s /var/db/acme/live/CANONICAL-DNS-NAME.DOM.AIN/cert ~pgsql/data16/server.crt
# ln -s /var/db/acme/live/CANONICAL-DNS-NAME.DOM.AIN/chain ~pgsql/data16/cacert.pem

And of course the keytab also needs to be installed in ~pgsql/data16/keytab, mode 0600, owner pgsql:pgsql.

Finally, in order for the replica to authenticate to the primary server, two environment variables need to be set to tell the Kerberos client libraries to use the server's keytab as the GSSAPI initiator credentials. These have to be set inside the startup script, which FreeBSD provides a convenient mechanism to do:

# sysrc postgresql_env="KRB5_CLIENT_KTNAME=/usr/local/pgsql/data16/keytab KRB5CCNAME=FILE:/usr/local/pgsql/krb5cc"

(The version of the startup script in the official FreeBSD package breaks this; our package has slightly modified the startup script to unbreak it.)

After doing this, restart the server with service postgresql restart, and verify that you can connect to it using GSSAPI authentication and TLS encryption:

$ psql 'host=CANONICAL-DNS-NAME.DOM.AIN dbname=template1 sslmode=verify-full'
template1=> \conninfo
You are connected to database "template1" as user "YOURNAME" on host "CANONICAL-DNS-NAME.DOM.AIN" (address "192.0.2.123") at port "5432".
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)

Before moving on to the next step, verify that the replica server is able to connect to the primary server:

su -l pgsql -c 'KRB5_CLIENT_KTNAME=/usr/local/pgsql/data16/keytab KRB5CCNAME=FILE:/usr/local/pgsql/krb5cc psql -U replication "dbname=template1 host=PRIMARY-NAME.DOM.AIN sslmode=verify-full sslrootcert=/usr/local/share/certs/ca-root-nss.crt"; klist'

If anything shows an error, or if klist does not show a postgres/PRIMARY-NAME.DOM.AIN ticket, then something is wrong and replication will not work. (The sslrootcert option should be able to be removed at some point; in PostgreSQL 14 it was still required to verify the server certificate.)

Transferring the initial data from the primary

The pg_basebackup utility requires its destination directory to either be empty or not exist, and we will need to replace some of the files transferred from the primary with appropriate local versions, so we will have it create a new directory and fix it up before stopping the PostgreSQL server process and moving the new data directory into place.

su -l pgsql -c 'export KRB5_CLIENT_KTNAME=/usr/local/pgsql/data16/keytab KRB5CCNAME=FILE:/usr/local/pgsql/krb5cc; pg_basebackup -D /usr/local/pgsql/data.replica -R -P -v -h READ-WRITE-SERVER.DOM.AIN -U replication -w -C -S REPLICATION-SLOT-NAME'

Note that the slot name may only contain lower-case letters, numerals, and underscores, so it cannot be a hostname. pg_basebackup will emit warnings from the server about skipping cacert.pem and server.crt, because those are symbolic links on the server side.

Complete the setup

Copy the keytab and server.key files from data16 to data.replica and re-create the symbolic links from server.crt and cacert.pem to the appropriate files in /var/db/acme.

Now switch PostgreSQL on the replica to the replica database state:

# cd ~pgsql
# service postgresql stop
# mv data16 data16.old
# mv data.replica data16
# service postgresql start

The pg_basebackup command above sets things up so that when the server starts it will automatically put itself in standby mode and start pulling database updates from the primary server asynchronously. In standby mode, PostgreSQL will accept read-only queries but refuse all writes, which must be performed on the primary server.

Upgrading an existing replica

PostgreSQL does not support replication between different major releases. For minor release upgrades, which do not change the on-disk format of the database, simply upgrading the replicas first, then upgrading the read-write site, should be sufficient. When doing a major upgrade of a replicated cluster, all the replicas must be shut down (or promoted to read-write sites, but I haven't done this sort of shuffle; it's easier to take downtime). Make sure that postgresql and puppet services are both stopped and disabled on the replica to ensure they don't prematurely start during the upgrade and resynchronization.

After successfully completing the upgrade on the read-write site, install the new release of the postgresql16-server and postgresql16-contrib packages on the replica, but don't run initdb to create a new database cluster; we're going to use pg_basebackup to copy the cluster from the read-write site. Note that unlike the examples above, we are going to build the new database cluster in its final filesystem path, /usr/local/pgsql/data16 (where 16 is the current major release; this is the default location for the FreeBSD postgresql16-server package).

As root:

cd ~pgsql
su -l pgsql -c 'export KRB5_CLIENT_KTNAME=/usr/local/pgsql/data16/keytab KRB5CCNAME=FILE:/usr/local/pgsql/krb5cc; pg_basebackup -D /usr/local/pgsql/data16 -R -P -v -h READ-WRITE-SERVER.DOM.AIN -U replication -w -C -S REPLICATION-SLOT-NAME'

The pg_basebackup process will copy some inappropriate files from the read-write server, and it will not copy symbolic links; both of these need to be fixed before PostgreSQL can be started on the replica. This is basically restoring the Kerberos and certificate setup that allows clients to connect to the replica and the replica to connect to the read-write site to stream updates. To fix these files:

cp -p data14/keytab data16/
cp -p data14/server.key data16/
ln -s /var/db/acme/live/REPLICA-NAME.DOM.AIN/chain data16/cacert.pem
ln -s /var/db/acme/live/REPLICA-NAME.DOM.AIN/cert data16/server.crt

(These examples assume an upgrade from PostgreSQL 14 to 16, which is the context in which this documentation was written.) Then edit /usr/local/libexec/acme/hooks/postgresql and /etc/rc.conf to change all references to the old data14 directory to data16. If all went well, service postgresql onestart should start the server and you can confirm that things are working by following the procedure in the next section.

Verify the setup

Verify that replication is functioning by connecting to any database on the read-write server and checking the replication status:

psql (16.8)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

template1=> set role pgsql;
SET
template1=# \x
Expanded display is on.
template1=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 45651
usesysid         | 181289
usename          | replication
application_name | walreceiver
client_addr      | 192.0.2.123
client_hostname  | 
client_port      | 63588
backend_start    | 2025-05-09 15:13:30.071815-04
backend_xmin     | 
state            | streaming
sent_lsn         | 0/D7000148
write_lsn        | 0/D7000148
flush_lsn        | 0/D7000148
replay_lsn       | 0/D7000148
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async
reply_time       | 2025-05-09 15:41:40.67159-04

You can also verify the status on the replica:

psql (16.8)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

template1=> set role pgsql;
SET
template1=# \x
Expanded display is on.
template1=# select * from pg_stat_wal_receiver ;
-[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 99793
status                | streaming
receive_start_lsn     | 0/D7000000
receive_start_tli     | 1
written_lsn           | 0/D7000148
flushed_lsn           | 0/D7000000
received_tli          | 1
last_msg_send_time    | 2025-05-09 15:43:48.098543-04
last_msg_receipt_time | 2025-05-09 15:43:48.098737-04
latest_end_lsn        | 0/D7000148
latest_end_time       | 2025-05-09 15:13:30.107658-04
slot_name             | REPLICATION-SLOT-NAME
sender_host           | READ-WRITE-SERVER.DOM.AIN
sender_port           | 5432
conninfo              | user=replication passfile=/usr/local/pgsql/.pgpass channel_binding=prefer dbname=replication host=READ-WRITE-SERVER.DOM.AIN port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable

Notes

The connection parameters for the replication are added to postgresql.auto.conf by pg_basebackup, and should probably be changed before starting the replica. Of particular interest is sslmode (which should be verify-full). I forgot to do this, and I haven't tested whether any additional parameters need to be added to make this work. (Obviously passfile can be removed as is it not used.)

gwollman/postgresql-replication.md