Skip to content

Instantly share code, notes, and snippets.

@eclecticmiraclecat
Last active September 24, 2020 02:50
Show Gist options
  • Save eclecticmiraclecat/e5ff582039a35365fba5fb84a7daadf8 to your computer and use it in GitHub Desktop.
Save eclecticmiraclecat/e5ff582039a35365fba5fb84a7daadf8 to your computer and use it in GitHub Desktop.

Stack

                192.168.1.109
                +---------+
                | haproxy |
                +---------+

    pg1             pg2             pg3
+----------+    +----------+    +----------+
| patroni  |    | patroni  |    | patroni  |
+----------+    +----------+    +----------+
| postgres |    | postgres |    | postgres |
+----------+    +----------+    +----------+ 

+------------------------------------------+
|                   etcd                   |
+------------------------------------------+

DB servers

  • pg1 192.168.1.109
  • pg2 192.168.1.111
  • pg3 192.168.1.112

Install packages

$ sudo yum group install "Development Tools"
$ sudo yum install wget screen python3 python3-devel epel-release

postgres

Install postgres on all

$ sudo yum install postgresql-server postgresql postgresql-devel

Distributed Configuration Store (DCS)

etcd

Install etcd

$ export RELEASE="3.3.18"
$ wget https://github.com/etcd-io/etcd/releases/download/v${RELEASE}/etcd-v${RELEASE}-linux-amd64.tar.gz
$ tar -xf etcd-v3.3.18-linux-amd64.tar.gz
$ sudo cp etcd-v3.3.18-linux-amd64/etcd* /usr/local/bin

Create storage directory

$ sudo mkdir -p /db/etcd
$ sudo chown postgres:postgres /db/etcd

Create conf file on all the pg servers

pg1

$ cat /etc/etcd.conf
name: pg1
data-dir: /db/etcd 
initial-advertise-peer-urls: http://192.168.1.109:2380
listen-peer-urls: http://0.0.0.0:2380 
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://192.168.1.109:2379
initial-cluster: "pg1=http://192.168.1.109:2380,pg2=http://192.168.1.111:2380,pg3=http://192.168.1.112:2380"

pg2

$ cat /etc/etcd.conf
name: pg2
data-dir: /db/etcd 
initial-advertise-peer-urls: http://192.168.1.111:2380
listen-peer-urls: http://0.0.0.0:2380 
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://192.168.1.111:2379
initial-cluster: "pg1=http://192.168.1.109:2380,pg2=http://192.168.1.111:2380,pg3=http://192.168.1.112:2380"

pg3

$ cat /etc/etcd.conf
name: pg3
data-dir: /db/etcd 
initial-advertise-peer-urls: http://192.168.1.112:2380
listen-peer-urls: http://0.0.0.0:2380 
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://192.168.1.112:2379
initial-cluster: "pg1=http://192.168.1.109:2380,pg2=http://192.168.1.111:2380,pg3=http://192.168.1.112:2380"

Disable selinux and firewall

$ sudo setenforce 0
$ sudo systemctl stop firewalld.service

Start etcd on all pg servers

$ screen -d -m sudo su - postgres -c "etcd --config-file /etc/etcd.conf"

test

$ etcdctl member list
721fe30e90b617a: name=pg2 peerURLs=http://192.168.1.111:2380 clientURLs=http://192.168.1.111:2379 isLeader=false
3bbfd6bd0acc17c6: name=pg1 peerURLs=http://192.168.1.109:2380 clientURLs=http://192.168.1.109:2379 isLeader=false
fcdf3ecbde2fa421: name=pg3 peerURLs=http://192.168.1.112:2380 clientURLs=http://192.168.1.112:2379 isLeader=true

pg2

$ sudo su - postgres -c "ETCDCTL_API=3 etcdctl put ha-cookbook-2 'Hello World'"

pg3

$ sudo su - postgres -c "ETCDCTL_API=3 etcdctl put ha-cookbook-3 'Hello Mars'"

pg1

$ sudo su - postgres -c "ETCDCTL_API=3 etcdctl get ha-cookbook-2"
ha-cookbook-2
Hello World

$ sudo su - postgres -c "ETCDCTL_API=3 etcdctl get ha-cookbook-3"
ha-cookbook-3
Hello Jupyter

$ sudo su - postgres -c "ETCDCTL_API=3 etcdctl get ha-cookbook-1 ha-cookbook-9"
ha-cookbook-2
Hello World
ha-cookbook-3
Hello Jupyter

Convert etcd to service file

$ cat /etc/systemd/system/etcd.service
[Unit] 
Description=etcd key-value store 
Documentation=https://github.com/etcd-io/etcd 
After=network.target 
 
[Service] 
User=postgres 
Type=notify 
ExecStart=/usr/local/bin/etcd --config-file /etc/etcd.conf 
Restart=always 
RestartSec=10s 
LimitNOFILE=40000 
 
[Install] 
WantedBy=multi-user.target 

Enable and start etcd service

$ sudo systemctl enable etcd
$ sudo systemctl start etcd

Patroni

Set permission for data directory

$ sudo chown postgres:postgres /db

Create conf directory

sudo mkdir /etc/patroni
sudo chown postgres:postgres /etc/patroni

Install patroni

$ sudo su - postgres
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install patroni[etcd] psycopg2

Patroni conf file

pg1

$ cat /etc/patroni/batman.yml

scope: batman
#namespace: /service/
name: postgresql0

restapi:
  listen: 192.168.1.109:8008
  connect_address: 192.168.1.109:8008

etcd:
  host: 127.0.0.1:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:

  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 0.0.0.0/0 md5
  - host all all 0.0.0.0/0 md5

  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 192.168.1.109:5432
  connect_address: 192.168.1.109:5432
  data_dir: /db/pgdata
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: postgres
      password: zalando
    rewind:  # Has no effect on postgres 10 and lower
      username: rewind_user
      password: rewind_password
  parameters:
    unix_socket_directories: '.'

modify name, listen and connect address on other servers

start patroni

pg1

$ patroni /etc/patroni/batman.yml
2020-09-10 12:35:05,849 INFO: Lock owner: postgresql0; I am postgresql0
2020-09-10 12:35:05,879 INFO: no action.  i am the leader with the lock

pg2

$ patroni /etc/patroni/batman.yml
2020-09-10 12:35:55,901 INFO: Lock owner: postgresql0; I am postgresql1
2020-09-10 12:35:55,901 INFO: does not have lock
2020-09-10 12:35:55,922 INFO: no action.  i am a secondary and i am following a leader

pg3

$ patroni /etc/patroni/batman.yml
2020-09-10 12:37:15,929 INFO: Lock owner: postgresql0; I am postgresql3
2020-09-10 12:37:15,929 INFO: does not have lock
2020-09-10 12:37:15,945 INFO: no action.  i am a secondary and i am following a leader

check cluster

$ patronictl -c /etc/patroni/batman.yml list
+ Cluster: batman (6870862611271680902) +---------+----+-----------+
| Member      | Host          | Role    | State   | TL | Lag in MB |
+-------------+---------------+---------+---------+----+-----------+
| postgresql0 | 192.168.1.109 | Leader  | running | 24 |           |
| postgresql1 | 192.168.1.111 | Replica | running | 24 |         0 |
| postgresql3 | 192.168.1.112 | Replica | running | 24 |         0 |
+-------------+---------------+---------+---------+----+-----------+

Convert patroni to service file

$ cat /etc/systemd/system/patroni.service

[Unit]
Description=Patroni PostgreSQL Cluster Service
Documentation=https://github.com/zalando/patroni
After=etcd.service

[Service]
User=postgres
Type=simple
ExecStart=/var/lib/pgsql/venv/bin/patroni /etc/patroni/batman.yml
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target

Enable and start patroni

$ sudo systemctl enable patroni
$ sudo systemctl start patroni

View logs

$ journalctl -u patroni

Haproxy

always write to primary db

install haproxy

 $ sudo yum install haproxy

haproxy config file

 $ cat /etc/haproxy/haproxy.cfg
 
  global
    maxconn 100

defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen batman
    bind *:5000
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg1 192.168.1.109:5432 maxconn 100 check port 8008
    server pg2 192.168.1.111:5432 maxconn 100 check port 8008
    server pg3 192.168.1.112:5432 maxconn 100 check port 8008

start haproxy

$ sudo systemctl restart haproxy

check if primary db is selected

pg1

$ psql -U postgres -h 192.168.1.109 -p 5000 -c "select inet_server_addr();"
 inet_server_addr 
------------------
 192.168.1.109

pg2

$ psql -U postgres -h 192.168.1.111 -p 5000 -c "select inet_server_addr();"
 inet_server_addr 
------------------
 192.168.1.109

pg3

$ psql -U postgres -h 192.168.1.112 -p 5000 -c "select inet_server_addr();"
 inet_server_addr 
------------------
 192.168.1.109

Multi DC (standby_cluster)

Reference

install packages

$ sudo yum group install "Development Tools"
$ sudo yum install wget tmux python3 python3-devel epel-release
$ sudo python3 -m pip install patroni[etcd] psycopg2

stack

two indipendent etcd cluster

dc1 cluster

  • pg0 (patroni + etcd + postgres)
  • pg1 (patroni + etcd + postgres)
  • pg2 (patroni + etcd + postgres)
$ patronictl -c /etc/patroni/robin.yaml list
+ Cluster: robin (6875669140607643544) ------+----+-----------+
| Member | Host          | Role    | State   | TL | Lag in MB |
+--------+---------------+---------+---------+----+-----------+
| pg0    | 192.168.3.200 | Leader  | running |  1 |           |
| pg1    | 192.168.3.201 | Replica | running |  1 |         0 |
| pg2    | 192.168.3.202 | Replica | running |  1 |         0 |
+--------+---------------+---------+---------+----+-----------+

dc2 standby cluster

  • pg3 (patroni + etcd + postgres)
  • pg4 (patroni + etcd + postgres)
  • pg5 (patroni + etcd + postgres)
$ patronictl -c /etc/patroni/robin.yaml list
+ Cluster: robin (6875669140607643544) ---+---------+----+-----------+
| Member | Host          | Role           | State   | TL | Lag in MB |
+--------+---------------+----------------+---------+----+-----------+
| pg3    | 192.168.3.203 | Standby Leader | running |  1 |           |
| pg4    | 192.168.3.204 | Replica        | running |  1 |         0 |
| pg5    | 192.168.3.205 | Replica        | running |  1 |         0 |
+--------+---------------+----------------+---------+----+-----------+

only one leader at a time in the cluster

$ curl  http://127.0.0.1:2379/v2/keys/service/robin/leader?prevExist=false
{"action":"get","node":{"key":"/service/robin/leader","value":"pg0","expiration":"2020-09-23T13:48:17.291844158Z","ttl":30,"modifiedIndex":993,"createdIndex":16}}

update value and will get "key already exists" error

$ curl -X PUT -d 'value="pg1"' http://127.0.0.1:2379/v2/keys/service/robin/leader?prevExist=false
{"errorCode":105,"message":"Key already exists","cause":"/service/robin/leader","index":988}

etcd dc1 cluster

pg0

$ cat /etc/etcd.conf
name: pg0
data-dir: /db/etcd
initial-advertise-peer-urls: http://192.168.3.200:2380
listen-peer-urls: http://0.0.0.0:2380
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://192.168.3.200:2379
initial-cluster: "pg0=http://192.168.3.200:2380,pg1=http://192.168.3.201:2380,pg2=http://192.168.3.202:2380"

patroni dc1 cluster

$ cat /etc/patroni/robin.yaml 
scope: robin
#namespace: /service/
name: pg0

restapi:
  listen: 192.168.3.200:8008
  connect_address: 192.168.3.200:8008

etcd:
  host: 127.0.0.1:2379

#consul:
#  host: 127.0.0.1:8500

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    slots:
      standby_cluster:
        type: physical
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: logical
        wal_log_hints: "on"

  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - local all all md5
  - host all all 0.0.0.0/0 md5
  - host replication replicator 127.0.0.1/32 md5
  - host replication replicator 0.0.0.0/0 md5

  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 192.168.3.200:5432
  connect_address: 192.168.3.200:5432
  data_dir: /db/pgdata_robin
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: postgres
      password: zalando
    rewind:  # Has no effect on postgres 10 and lower
      username: rewind_user
      password: rewind_password
  parameters: 
    unix_socket_directories: '.'

etcd dc2 cluster

pg3

name: pg3
data-dir: /db/etcd
initial-advertise-peer-urls: http://192.168.3.203:2380
listen-peer-urls: http://0.0.0.0:2380
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://192.168.3.203:2379
initial-cluster: "pg3=http://192.168.3.203:2380,pg4=http://192.168.3.204:2380,pg5=http://192.168.3.205:2380"

patroni dc2 cluster

scope: robin
#namespace: /service/
name: pg3

restapi:
  listen: 192.168.3.203:8008
  connect_address: 192.168.3.203:8008

etcd:
  host: 127.0.0.1:2379

#consul:
#  host: 127.0.0.1:8500

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    standby_cluster:
      host: 192.168.3.200
      port: 5432
      primary_slot_name: standby_cluster
      create_replica_methods:
        - basebackup
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: logical
        wal_log_hints: "on"

  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - local all all md5
  - host all all 0.0.0.0/0 md5
  - host replication replicator 127.0.0.1/32 md5
  - host replication replicator 0.0.0.0/0 md5

  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 192.168.3.203:5432
  connect_address: 192.168.3.203:5432
  data_dir: /db/pgdata_robin
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: postgres
      password: zalando
    rewind:  # Has no effect on postgres 10 and lower
      username: rewind_user
      password: rewind_password
  parameters: 
    unix_socket_directories: '.'

upgrade dc2 standby cluster to be main

$ patronictl --config-file /etc/patroni/robin.yaml edit-config
--- 
+++ 
@@ -7,10 +7,4 @@
   use_pg_rewind: true
   use_slots: true
 retry_timeout: 10
-standby_cluster:
-  create_replica_methods:
-  - basebackup
-  host: 192.168.3.200
-  port: 5432
-  primary_slot_name: standby_cluster
 ttl: 30

new dc2 cluster

$ patronictl --config-file /etc/patroni/robin.yaml list
+ Cluster: robin (6875669140607643544) ------+----+-----------+
| Member | Host          | Role    | State   | TL | Lag in MB |
+--------+---------------+---------+---------+----+-----------+
| pg3    | 192.168.3.203 | Leader  | running |  2 |           |
| pg4    | 192.168.3.204 | Replica | running |  2 |         0 |
| pg5    | 192.168.3.205 | Replica | running |  2 |         0 |
+--------+---------------+---------+---------+----+-----------+

set pg_hba to more secure after test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment