The following guide was copied from our private internal documentation, but there's nothing private in it, we just don't have anywhere to publish documentation like this. By the time you read it, it will likely be out of date and may not reflect our current practice. Some references to our internal configuration management remain; you'll have to figure those bits out for yourself.
Provisioning of PostgreSQL replicas is complicated. It's more complicated still if we want to use secure authentication, which is why we haven't done it before (despite the best of intentions). This guide will follow my (wollman's) process in setting this up for the first time, hopefully with all the dead ends removed.
There are several different ways of doing replication in PostgreSQL. The official documentation describes all of them. Ignore the official documentation, it will only confuse you. (Trust me.)
We are currently running PostgreSQL 16, which is installed on servers
using our custom version of the postgresql16-server
package.
We override some parameters in the package build for compatibility
with older PostgreSQL releases; notably, we use the user pgsql
(not
postgres
) and UID/GID 70.
The pgsql
user's home directory is /usr/local/pgsql
, and the
database cluster is installed in versioned data directories, currently
/usr/local/pgsql/data16
.
We use the acmetool
package to automatically provision Let's Encrypt
certificates on the servers; since the database servers do not run web
servers, the "development" configuration can be used to run a built-in
web server only when acmetool
is actively trying to obtain a new
certificate.
(It should be practical to configure acmetool
from Puppet to
automatically provision certificates in this configuration, but I
haven't figured that out yet.
Maybe I'll switch to something else if I can find another implementation
that doesn't barf Python stack dumps on errors.)
We use the postgresql
Puppet module to automate much of the
configuration, but we manually install our own pg_hba.conf
and
pg_ident.conf
files because it's clearer to just write what we mean
according to the PostgreSQL documentation rather than trying to trick
the Puppet module into doing it.
The csail::postgresql
class handles installing the files in the
appropriate data directory, with configuration in the cluster
hierarchy.
We will be using "replication slots" below, which are a built-in
mechanism in PostgreSQL to keep track of which write-ahead logs have
been transferred to each replica, so they are not deleted prematurely.
Sufficient slots (the number of replicas plus one for provisioning)
must be configured on the server, but the default configuration has 10
slots which should be more than enough for our purposes.
(The server configuration can be modified on-line using the ALTER SYSTEM
superuser command in SQL.)
(If you are upgrading an existing replica, rather than standing up a new replica, then all this has been done already and you should skip to the "Upgrading" section below.)
This document assumes the primary (read-write) server has already been set
up and is functioning normally, and that it is configured with TLS and
GSSAPI.
The replicas will each need their own Kerberos keytabs, for their
distinct postgres/CANONICAL-DNS-NAME.DOM.AIN
princpals.
These can be created in the normal way with kadmin
and copied to
somewhere safe on the server, but note that pg_basebackup
will copy
the keytab file from the primary server, which is not the right
credential for the replicas.
The way we are doing this, the replicas use their server credentials
as GSSAPI initiator credentials, so they should ultimately be using
the same keytab file in both roles.
In order to make this work, the primary server must be configured to
allow replication
connections from the replicas.
These connections are used both in initial provisioning of each
replica, and also operationally for the replicas to stream updates
from the primary server, so the primary has to be properly configured
first.
This requires three steps:
- creating a database user with the
REPLICATION
privilege - mapping the replicas' Kerberos principals to the new database user
- configuring the primary server's access list to allow replication connections as this user from the replicas
Since users are global to a PostgreSQL cluster it doesn't actually
matter which database you connect to, and this can obviously be done
on the primary rather than over the network.
I assume that the user running this has been granted access to a
database role that has SUPERUSER
privileges (in our case, pgsql
).
$ psql -h PRIMARY template1
template1=> SET ROLE pgsql;
SET
template1=# CREATE USER "replication" WITH REPLICATION;
CREATE USER
template1=#
By convention, on our servers only users in the password_users
group
can connect to the server using password authentication, so this new
replication
user can only connect with local (UID-based)
authentication, or by being mapped from another identify, which we'll
set up in the next step.
If your database user doesn't have superuser privileges, you can just
run psql
locally as the pgsql
user, and skip the SET ROLE
command.
All of the PostgreSQL service principals are named
postgresql/CANONICAL-DNS-NAME.DOM.AIN
.
The pg_ident.conf
file (configured in Puppet by
csail::postgresql::pg_ident_conf
, which points to a file served from
modules/csail/files/postgresql
) configures the mapping from
principal names ot PostgreSQL users.
The file consists of multiple maps, each a series of lines containing
whitespace-separated fields.
The first field is the name of the map (normally we just use csail
);
the second field is either a string or a regular expression that is
matched against the Kerberos principal; and the third field is the
name of the PostgreSQL user (possibly substituting captured
subexpressions from the second column).
If the second field starts with a /
, the remaining characters are
treated as an Extended Regular Expression; otherwise it is an exact
string match.
The map is evaluated in order, first match wins.
For replication, add lines to the map for the service principal name of each replica, thusly:
csail postgres/CANONICAL-DNS-NAME.DOM.AIN replication
PostgreSQL keeps its server authentication configuration in a file
called pg_hba.conf
; like pg_ident.conf
we provision this from
Puppet using the csail::postgresql
module.
Currently this file is not generated automatically, but with exported
resources it should be possible to automate this step of the
provisioning based on the cluster
fact.
PostgreSQL replication uses a different protocol from regular client
access, but connects on the same port with the same options for access
control; access controls for the replication protocol are represented
by access rules that specify replication
in place of a database
name.
The database wildcard all
does not match replication connections, so
they must be explicitly permitted even in the usual case where regular
client access is wildcarded.
It is best to make the access rules for replication to be as specific
as possible, limited to just the one specific user and only the IP
addresses used by the replicas.
(It's probably a good idea to also include the primary server's
addresses in the ACL, because the same pg_hba.conf
file will be
installed across the Puppet cluster
.)
Example:
hostssl replication replication 192.0.2.123/32 gss krb_realm=CSAIL.MIT.EDU map=csail include_realm=0
hostssl replication replication 2001:db8::1234:5678/128 gss krb_realm=CSAIL.MIT.EDU map=csail include_realm=0
After the Puppet agent changes pg_hba.conf
it will be necessary to
restart PostgreSQL on the primary server; Puppet knows how to do a
server reload, which is normally what you want, but this is
insufficient for changes to pg_hba.conf
.
Standard PostgreSQL replication works on the level of the
entire "cluster" (in the PostgreSQL sense of the word: all of the
databases on the server).
The pg_basebackup
utility will copy all of the regular files from
the data directory of the primary, which includes some inappropriate
items like keytabs and TLS keys.
Therefore, it's best to configure and test the replica with the
configuration our Puppet class expects before starting replication, to
make sure that any provisioning issues are worked out and that GSSAPI
and TLS server-mode authentication are both functioning properly.
To set up TLS certificates with acmetool
, you will need to first
install an acmetool
hook script, which installs a copy of the
private key readable by the PostgreSQL server process.
This isn't automated yet.
Install the following script in
/usr/local/libexec/acme/hooks/postgresql
, substituting the
appropriate DNS name into the path in line 5:
#!/bin/sh -e
EVENT_NAME="$1"
[ "$EVENT_NAME" = "live-updated" ] || exit 42
# Update the key in case it was rotated
install -c -o pgsql -g pgsql -m 400 /var/db/acme/live/CANONICAL-DNS-NAME.DOM.AIN/privkey /usr/local/pgsql/data16/server.key
# Signal the server
service postgresql quietrestart 2>/dev/null >/dev/null
and chmod +x
so that acmetool
will restart PostgreSQL every time
the certificate changes.
This only handles copying the private key; the certificates, being public, need to be symlinked into place in the data directory:
# ln -s /var/db/acme/live/CANONICAL-DNS-NAME.DOM.AIN/cert ~pgsql/data16/server.crt
# ln -s /var/db/acme/live/CANONICAL-DNS-NAME.DOM.AIN/chain ~pgsql/data16/cacert.pem
And of course the keytab also needs to be installed in
~pgsql/data16/keytab
, mode 0600, owner pgsql:pgsql
.
Finally, in order for the replica to authenticate to the primary server, two environment variables need to be set to tell the Kerberos client libraries to use the server's keytab as the GSSAPI initiator credentials. These have to be set inside the startup script, which FreeBSD provides a convenient mechanism to do:
# sysrc postgresql_env="KRB5_CLIENT_KTNAME=/usr/local/pgsql/data16/keytab KRB5CCNAME=FILE:/usr/local/pgsql/krb5cc"
(The version of the startup script in the official FreeBSD package breaks this; our package has slightly modified the startup script to unbreak it.)
After doing this, restart the server with service postgresql restart
, and verify that you can connect to it using GSSAPI
authentication and TLS encryption:
$ psql 'host=CANONICAL-DNS-NAME.DOM.AIN dbname=template1 sslmode=verify-full'
template1=> \conninfo
You are connected to database "template1" as user "YOURNAME" on host "CANONICAL-DNS-NAME.DOM.AIN" (address "192.0.2.123") at port "5432".
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Before moving on to the next step, verify that the replica server is able to connect to the primary server:
su -l pgsql -c 'KRB5_CLIENT_KTNAME=/usr/local/pgsql/data16/keytab KRB5CCNAME=FILE:/usr/local/pgsql/krb5cc psql -U replication "dbname=template1 host=PRIMARY-NAME.DOM.AIN sslmode=verify-full sslrootcert=/usr/local/share/certs/ca-root-nss.crt"; klist'
If anything shows an error, or if klist
does not show a
postgres/PRIMARY-NAME.DOM.AIN
ticket, then something is wrong
and replication will not work.
(The sslrootcert
option should be able to be removed at some point;
in PostgreSQL 14 it was still required to verify the server
certificate.)
The pg_basebackup
utility requires its destination directory to
either be empty or not exist, and we will need to replace some of the
files transferred from the primary with appropriate local versions, so
we will have it create a new directory and fix it up before stopping
the PostgreSQL server process and moving the new data directory into
place.
su -l pgsql -c 'export KRB5_CLIENT_KTNAME=/usr/local/pgsql/data16/keytab KRB5CCNAME=FILE:/usr/local/pgsql/krb5cc; pg_basebackup -D /usr/local/pgsql/data.replica -R -P -v -h READ-WRITE-SERVER.DOM.AIN -U replication -w -C -S REPLICATION-SLOT-NAME'
Note that the slot name may only contain lower-case letters, numerals,
and underscores, so it cannot be a hostname.
pg_basebackup
will emit warnings from the server about skipping
cacert.pem
and server.crt
, because those are symbolic links on the
server side.
Copy the keytab
and server.key
files from data16
to
data.replica
and re-create the symbolic links from server.crt
and
cacert.pem
to the appropriate files in /var/db/acme
.
Now switch PostgreSQL on the replica to the replica database state:
# cd ~pgsql
# service postgresql stop
# mv data16 data16.old
# mv data.replica data16
# service postgresql start
The pg_basebackup
command above sets things up so that when the
server starts it will automatically put itself in standby mode and
start pulling database updates from the primary server asynchronously.
In standby mode, PostgreSQL will accept read-only queries but refuse
all writes, which must be performed on the primary server.
PostgreSQL does not support replication between different major
releases.
For minor release upgrades, which do not change the on-disk format
of the database, simply upgrading the replicas first, then upgrading
the read-write site, should be sufficient.
When doing a major upgrade of a replicated cluster,
all the replicas must be shut down
(or promoted to read-write sites, but I haven't done this sort of shuffle;
it's easier to take downtime).
Make sure that postgresql
and puppet
services are both stopped and
disabled on the replica to ensure they don't prematurely start during
the upgrade and resynchronization.
After successfully completing the upgrade on the read-write site, install
the new release of the postgresql16-server
and postgresql16-contrib
packages on the replica, but don't run initdb
to create a new database
cluster; we're going to use pg_basebackup
to copy the cluster from the
read-write site.
Note that unlike the examples above, we are going to build the new database
cluster in its final filesystem path, /usr/local/pgsql/data16
(where 16
is the current major release; this is the default location for the FreeBSD
postgresql16-server
package).
As root:
cd ~pgsql
su -l pgsql -c 'export KRB5_CLIENT_KTNAME=/usr/local/pgsql/data16/keytab KRB5CCNAME=FILE:/usr/local/pgsql/krb5cc; pg_basebackup -D /usr/local/pgsql/data16 -R -P -v -h READ-WRITE-SERVER.DOM.AIN -U replication -w -C -S REPLICATION-SLOT-NAME'
The pg_basebackup
process will copy some inappropriate files from
the read-write server, and it will not copy symbolic links; both of these
need to be fixed before PostgreSQL can be started on the replica.
This is basically restoring the Kerberos and certificate setup that allows
clients to connect to the replica and the replica to connect to the
read-write site to stream updates.
To fix these files:
cp -p data14/keytab data16/
cp -p data14/server.key data16/
ln -s /var/db/acme/live/REPLICA-NAME.DOM.AIN/chain data16/cacert.pem
ln -s /var/db/acme/live/REPLICA-NAME.DOM.AIN/cert data16/server.crt
(These examples assume an upgrade from PostgreSQL 14 to 16, which is
the context in which this documentation was written.)
Then edit /usr/local/libexec/acme/hooks/postgresql
and /etc/rc.conf
to change all references to the old data14
directory to data16
.
If all went well, service postgresql onestart
should start the server
and you can confirm that things are working by following the procedure
in the next section.
Verify that replication is functioning by connecting to any database on the read-write server and checking the replication status:
psql (16.8)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
template1=> set role pgsql;
SET
template1=# \x
Expanded display is on.
template1=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 45651
usesysid | 181289
usename | replication
application_name | walreceiver
client_addr | 192.0.2.123
client_hostname |
client_port | 63588
backend_start | 2025-05-09 15:13:30.071815-04
backend_xmin |
state | streaming
sent_lsn | 0/D7000148
write_lsn | 0/D7000148
flush_lsn | 0/D7000148
replay_lsn | 0/D7000148
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2025-05-09 15:41:40.67159-04
You can also verify the status on the replica:
psql (16.8)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
template1=> set role pgsql;
SET
template1=# \x
Expanded display is on.
template1=# select * from pg_stat_wal_receiver ;
-[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 99793
status | streaming
receive_start_lsn | 0/D7000000
receive_start_tli | 1
written_lsn | 0/D7000148
flushed_lsn | 0/D7000000
received_tli | 1
last_msg_send_time | 2025-05-09 15:43:48.098543-04
last_msg_receipt_time | 2025-05-09 15:43:48.098737-04
latest_end_lsn | 0/D7000148
latest_end_time | 2025-05-09 15:13:30.107658-04
slot_name | REPLICATION-SLOT-NAME
sender_host | READ-WRITE-SERVER.DOM.AIN
sender_port | 5432
conninfo | user=replication passfile=/usr/local/pgsql/.pgpass channel_binding=prefer dbname=replication host=READ-WRITE-SERVER.DOM.AIN port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable
The connection parameters for the replication are added to
postgresql.auto.conf
by pg_basebackup
, and should probably be
changed before starting the replica.
Of particular interest is sslmode
(which should be verify-full
).
I forgot to do this, and I haven't tested whether any additional
parameters need to be added to make this work.
(Obviously passfile
can be removed as is it not used.)