In this project, our goal is to establish a robust and scalable infrastructure for a PostgreSQL database with high availability, seamless security, and integrated monitoring and alerting systems.
We'll leverage tools like Patroni, Consul, Vault, Prometheus, Grafana, and Cert-Manager to ensure a comprehensive, modern solution. Coolify will act as our orchestration platform, managing various services and simplifying deployments. We aim to not only build a highly available database cluster but also provide a learning experience for interns that demonstrates best practices in DevOps, security, and observability.
The backbone of our infrastructure will focus on a distributed, high-availability PostgreSQL cluster. To ensure reliability, we’ll introduce Patroni for automating failover, Consul for service coordination, and Vault for managing sensitive information. Monitoring will be handled by Prometheus and visualized using Grafana, while Cert-Manager will manage SSL certificates, reinforcing our focus on security. We aim to introduce the concept of ephemeral design in this project, which emphasizes solutions that avoid persistent state issues—highlighting a preventative approach to infrastructure management.
The technology stack we'll introduce in this project includes:
-
Coolify v4.0.0-beta.355: Our management platform for deploying, monitoring, and maintaining all services and applications. Coolify streamlines orchestration by offering an easy-to-use UI, while maintaining flexibility in deploying services. It’s perfect for teaching new developers how to manage a full-stack application from database to front-end.
-
PostgreSQL with Patroni: The database at the heart of our project, PostgreSQL is renowned for its reliability and performance. Patroni is a framework that allows us to manage a highly available PostgreSQL cluster, automatically handling leader elections and failover when necessary.
-
pgBackRest: An advanced backup and restore tool for PostgreSQL. We'll use it to ensure our data is safely stored and easily retrievable in case of disaster. By creating automated backup schedules with daily incremental backups, we ensure efficient storage and performance while minimizing the impact on the cluster.
-
Consul: A distributed service mesh that allows for service discovery and coordination. In this project, Consul will be used to manage our PostgreSQL cluster, ensuring high availability by assisting with Patroni’s leader election process and coordinating between nodes.
-
Vault: HashiCorp Vault is a secret management solution that allows for secure storage and controlled access to secrets. Vault will handle all credentials and sensitive information related to our PostgreSQL cluster, as well as secure tokens for OAuth2.
-
Cert-Manager: This tool automates the issuance and management of SSL certificates. We will use it specifically for managing SSL certificates, allowing us to focus on keeping our infrastructure ephemeral and secure. SSL certificate management is a complex process, and Cert-Manager ensures that certificates are renewed automatically, reducing the risk of downtime due to expired certificates.
-
Prometheus and Grafana: Prometheus will handle monitoring and metrics collection across all services, while Grafana will provide visualization of these metrics. By tracking PostgreSQL performance, disk usage, and system health, we ensure early detection of issues. Grafana Loki will assist with log aggregation, allowing us to monitor logs and troubleshoot efficiently.
-
AlertManager: Part of the Prometheus ecosystem, AlertManager will help notify us when certain thresholds are reached, enabling proactive infrastructure management.
-
OAuth2 and Drizzle ORM: For future applications, we’ll integrate OAuth2 for authentication and Drizzle ORM for database interaction. OAuth2 will manage authentication for applications connecting to the PostgreSQL cluster, while Drizzle ORM will serve as the interface for querying the database from TypeScript-based applications.
In any database-driven system, ensuring uptime and security are the top priorities. However, traditional database management is often susceptible to issues such as single points of failure, inefficient backup processes, and poor secrets management. This is where our approach with a distributed, highly available PostgreSQL cluster shines.
High Availability means ensuring that our system is resilient to hardware and software failures. In the case of PostgreSQL, this involves creating a cluster where multiple nodes (servers) work together to ensure that even if one node fails, others can continue to operate. Patroni and Consul work hand-in-hand to achieve this. Patroni acts as the orchestrator, monitoring the health of the database nodes and deciding when a failover (a change in the database leader) is necessary. Consul coordinates this by keeping track of the state of each node, ensuring consistent and accurate information on the cluster’s health.
By distributing the database across multiple nodes, we create an infrastructure that can handle node failures without downtime. This ensures ephemeral recovery—failures are quickly rectified through automated processes that prevent prolonged downtime, reducing the need for manual intervention and offering a "soft" security layer that emphasizes prevention of large-scale issues.
Consul is a service mesh that coordinates between services, providing a consistent state of the system across multiple nodes. In our case, Consul will track the state of each PostgreSQL instance and assist Patroni in leader election—ensuring that at any given time, one node is the primary writer while others are replicas, ready to take over if needed.
Patroni is the heart of our high availability strategy for PostgreSQL. It allows us to define and maintain a cluster of PostgreSQL nodes, ensuring they are all synchronized and can seamlessly take over the leader role if the current leader fails. It manages PostgreSQL’s built-in streaming replication mechanism and automates failover, making sure the database is always available.
Backups are crucial to any database system. We are using pgBackRest because it allows us to efficiently manage backups and restores, especially in large, distributed environments like ours. Our strategy involves daily incremental backups to ensure that data is always safe without overwhelming our storage system. The design is to perform daily backups and manage them through Coolify’s UI.
A core part of our strategy is secrets management. Storing sensitive information such as passwords, OAuth tokens, and API keys in plain text is a significant security risk. Vault mitigates this by storing secrets securely and only exposing them to services when absolutely necessary. For example, PostgreSQL credentials, OAuth tokens, and any other critical secrets will be dynamically managed through Vault, ensuring that no sensitive data is ever hard-coded or exposed.
Vault also provides an additional layer of ephemeral security. Instead of permanent credentials that are vulnerable if exposed, Vault generates dynamic, time-bound credentials that expire, minimizing the risk window. If a secret is compromised, it is rendered useless after its TTL expires, reinforcing the idea of prevention over remediation.
While Vault handles secrets, we rely on Cert-Manager specifically for managing SSL certificates. SSL certificates ensure that our communication channels are encrypted, protecting data in transit. Cert-Manager simplifies this process by automating the creation, distribution, and renewal of SSL certificates. For example, Let’s Encrypt will be our certificate authority, and Cert-Manager will handle renewals automatically, reducing the need for manual intervention.
SSL certificate management is an ephemeral process, where certificates are automatically replaced before they expire. By integrating Cert-Manager, we introduce a "soft" layer of security that prevents issues like expired certificates from becoming a concern.
Monitoring is essential to ensuring the health of our system. Prometheus is a powerful metrics collection tool that will gather real-time data from all components, including PostgreSQL, Consul, and our nodes. This data will be visualized using Grafana, enabling us to create dashboards that display critical metrics like CPU usage, memory consumption, disk I/O, and query performance.
In addition to monitoring, Grafana Loki will be deployed to aggregate logs, allowing us to track and troubleshoot issues efficiently. AlertManager will notify us whenever certain thresholds are reached, ensuring we are always aware of any potential issues before they escalate.
This monitoring system is vital to the ephemeral nature of our infrastructure. By continuously collecting and analyzing data, we can proactively address issues before they cause downtime, avoiding unnecessary risks and maintaining uptime.
The Prerequisites section focuses solely on what's necessary for setting up the proper environment for installing and configuring PostgreSQL cluster and its supporting services:
Before proceeding to Step 1, ensure the following components are in place:
-
Download the Vault Binary:
- Use
wget
to download the latest version of Vault. ReplaceVERSION
with the latest version number (check the latest version here).
wget https://releases.hashicorp.com/vault/VERSION/vault_VERSION_linux_amd64.zip
- Use
-
Unzip the Downloaded File:
- Install the
unzip
package if it’s not already installed.
sudo apt update sudo apt install unzip
- Unzip the downloaded file.
unzip vault_VERSION_linux_amd64.zip
- Install the
-
Move the Binary to a Directory in Your PATH:
sudo mv vault /usr/local/bin/
-
Verify the Installation:
- Check that Vault is installed correctly by running:
vault -version
-
Create a Configuration File:
- Create a configuration file for Vault at
/etc/vault.hcl
.
sudo nano /etc/vault.hcl
- Add the following configuration:
storage "file" { path = "/opt/vault/data" } listener "tcp" { address = "0.0.0.0:8200" tls_disable = 1 # Set to 1 for development; use TLS in production } api_addr = "http://127.0.0.1:8200"
- Create a configuration file for Vault at
-
Create the Data Directory:
sudo mkdir -p /opt/vault/data sudo chown -R $(whoami):$(whoami) /opt/vault/data
-
Start Vault Server:
- Start Vault in the background using the configuration file.
vault server -config=/etc/vault.hcl
-
Set the Environment Variable for Vault:
export VAULT_ADDR='http://127.0.0.1:8200'
-
Initialize Vault:
- Run the following command to initialize Vault. This command will output the unseal keys and the initial root token. Store these keys securely.
vault operator init
-
Unseal Vault:
- Use the unseal keys from the initialization step to unseal Vault. You will need to enter at least three unseal keys (depending on the threshold set during initialization).
vault operator unseal <unseal_key_1> vault operator unseal <unseal_key_2> vault operator unseal <unseal_key_3>
-
Login to Vault using the root token generated during initialization. In HashiCorp Vault, while you can paste the root token directly into the
vault login
command, it's generally recommended to avoid hardcoding or directly pasting sensitive tokens into scripts or commands. Instead, best practices suggest using environment variables or other secure methods to manage sensitive information. Here’s how you can handle it:- Using an Environment Variable:
- You can store the root token in an environment variable and then use it in the
vault login
command. For example:
export VAULT_TOKEN=<root_token> vault login $VAULT_TOKEN
- Using a File:
- Store the token in a secure file and use it when needed:
echo "<root_token>" > ~/vault_token.txt vault login $(cat ~/vault_token.txt)
- Prompting for Token:
- You can also have the script prompt you for the token, which avoids showing it in command history:
read -s -p "Enter Vault Token: " VAULT_TOKEN vault login $VAULT_TOKEN
AppRole is one of the authentication methods offered by HashiCorp Vault, designed for machines or applications to authenticate and obtain secrets from Vault securely. It is particularly useful in scenarios where human interaction is minimal or nonexistent (e.g., automation pipelines, services, or applications running in the background).
AppRole allows clients to authenticate by combining two key elements:
- Role ID: A static identifier (like a username) assigned to the application or service.
- Secret ID: A dynamic, time-bound identifier (like a password) that provides an extra layer of security. It can be generated for each authentication request and has time-to-live (TTL) and usage limits.
Together, these two elements allow an application or machine to securely authenticate with Vault, obtain a token, and then use that token to access secrets or perform other operations based on the associated policies.
AppRole is ideal for:
- Non-interactive authentication: For services, scripts, or automated processes needing access to Vault without human interaction.
- Secure secret distribution: Used when there is a need to distribute dynamic credentials for short-lived access.
- Granular access control: AppRole allows you to assign specific policies that limit what secrets the authenticated entity can access.
- Role ID: This is tied to a specific role in Vault and defines what policies or permissions an AppRole has. It’s static, meaning it doesn’t change and is known to the application that wants to authenticate.
- Secret ID: This is dynamic and can be generated on demand. It can be time-bound (TTL) and restricted by usage count, adding a layer of security to the authentication process.
- Policies: Vault policies define what permissions (read, write, delete, etc.) are granted once the AppRole successfully authenticates.
-
Enable AppRole Authentication Method:
vault auth enable approle
-
Create an AppRole with Specific Policies: Define a role that ties an application to specific policies and operations:
vault write auth/approle/role/my-role \ token_policies="my-policy" \ secret_id_ttl="24h" \ token_ttl="1h" \ token_max_ttl="4h"
-
Retrieve the Role ID: The role ID will be needed by the application:
vault read auth/approle/role/my-role/role-id
-
Generate a Secret ID: The secret ID is generated dynamically and can have a TTL or usage limit:
vault write -f auth/approle/role/my-role/secret-id
-
Authenticate with Role ID and Secret ID: The application will use both the role ID and the secret ID to authenticate and get a Vault token:
vault write auth/approle/login \ role_id="<your_role_id>" \ secret_id="<your_secret_id>"
-
Vault Issues a Token: Upon successful authentication, Vault returns a token that the application can use to access secrets:
Key Value --- ----- token s.ZqWvxGadf9lf.... token_accessor Bdswvhsl.... token_duration 1h token_renewable true
- Limit Secret ID TTL: Keep the secret ID short-lived to reduce the risk of exposure.
- Restrict Secret ID Use: Limit the number of uses for a secret ID (e.g., a single-use secret ID).
- Token Renewal: Ensure the token obtained by the AppRole is renewable so that long-running applications don’t need to re-authenticate as frequently.
- Monitoring and Auditing: Enable auditing to track when roles, secrets, and tokens are used, ensuring no unauthorized access.
- Automated Systems: AppRole is perfect for non-human systems, such as microservices, scripts, or CI/CD pipelines, where there is no user to provide credentials interactively.
- Fine-Grained Control: It offers precise control over permissions and authentication.
- Security: By combining a static identifier (role ID) with a dynamic secret (secret ID), AppRole adds an extra layer of security, making it harder to misuse.
AppRole, in essence, makes Vault access secure and manageable for non-interactive systems, ensuring secrets are distributed safely and with fine-tuned control over what resources each authenticated entity can access.
- Installed Vault and configured it to run.
- Initialized Vault, received unseal keys, and unsealed it for use.
- While pasting the root token directly is possible, it is not ideal due to security risks. Using environment variables or prompting for input is a more secure practice.
- Configure AppRole for internal authentication
- Set up the necessary environment variables in Coolify:
POSTGRES_USERNAME
: Username for PostgreSQL.POSTGRES_PASSWORD
: Password for PostgreSQL.VAULT_TOKEN
: Token for accessing Vault.VAULT_ADDR
: URL of the Vault server.OAuth2_Client_ID
: Client ID for OAuth2.OAuth2_Client_Secret
: Client Secret for OAuth2.
Patroni does not strictly depend on Consul. However, it does rely on a Distributed Configuration Store (DCS) to coordinate leader election and cluster state. Consul is just one of several supported options for the DCS in Patroni. Other options include etcd, ZooKeeper, and Kubernetes.
Consul, however, is the best option for us, considering the following factors in our environment:
Since we already have a preference for HashiCorp products, and we are using Vault for secrets management, Consul aligns naturally with this stack. Both Consul and Vault are designed to work well together, sharing similar security, scaling, and configuration mechanisms. HashiCorp's toolset is known for its powerful and reliable orchestration, which suits our needs.
Consul offers native service discovery capabilities, which can be useful not only for Patroni, but also for Traefik (our load balancer) and other services in the future. This means it centralizes service discovery, leader election, and cluster state management across our infrastructure, making it more efficient to maintain.
- Consul can easily scale with our system, which may become important as our PostgreSQL cluster grows or if we start managing additional services.
- It also offers flexibility, as Consul is capable of being a DNS provider, so it integrates well with dynamic environments, especially ones using containers (even though we're not using Kubernetes in this setup).
Since we're operating on a VPS with limited resources (8GB RAM), Consul is relatively lightweight compared to other options like Kubernetes, which would likely introduce unnecessary complexity and resource overhead.
With Coolify managing our infrastructure, Consul provides a well-integrated solution for managing high availability and leader election in Patroni, without needing a separate orchestration system like Kubernetes. This streamlines the setup while providing robust DCS functionality.
- etcd: While popular with Patroni and Kubernetes users, etcd is more commonly deployed in Kubernetes environments and doesn't provide the same level of integration with Vault or service discovery capabilities that Consul offers.
- ZooKeeper: This is an older option, but generally considered more complex to manage compared to Consul or etcd. It also doesn't provide the same service discovery capabilities.
- Kubernetes: While powerful, Kubernetes introduces significant complexity and overhead for a relatively small setup like ours. It’s best suited for large-scale containerized environments, which isn't necessary for our project.
Given our focus on using a cohesive, efficient, and scalable solution with minimal resource overhead, Consul is the best fit for managing the DCS in Patroni. It integrates well with the existing HashiCorp tools (Vault), provides essential service discovery features for Traefik, and fits our goal of maintaining a lightweight yet reliable infrastructure. Installing Consul is crucial, not only for PostgreSQL high availability but also because it tightly integrates with Vault, providing essential service discovery and coordination for secrets management.
We will configure Consul through Coolify's management interface to serve as our distributed coordination and service discovery tool. This will ensure that Consul can integrate with Vault and Patroni efficiently.
- Navigate to Coolify's dashboard.
- From the Services tab, select the Add a New Service option.
- Choose Consul from the available services.
- Configure the following options:
- Name:
vsExoConsul
- Port: 8500 (default port)
- Nodes: Set the number of nodes based on our cluster (at least 3 for production-grade reliability).
- Name:
We need to configure Consul to act as the service discovery backend for both Patroni and Vault. Here is a sample Consul configuration:
{
"datacenter": "dc1",
"data_dir": "/var/lib/consul",
"log_level": "INFO",
"node_name": "consul-node-1",
"server": true,
"bootstrap_expect": 3,
"ui": true,
"client_addr": "0.0.0.0",
"bind_addr": "153.92.208.222",
"retry_join": ["153.92.208.222"],
"encrypt": "your_consul_encryption_key",
"performance": {
"raft_multiplier": 1
}
}
- The
encrypt
key is automatically managed by Coolify if we choose encrypted gossip. - The
retry_join
IP is the address of our VPS.
Once the configuration is completed, we start the Consul service:
- Go to the Services section in Coolify.
- Start the
vsExoConsul
service. - Ensure that the service is accessible by opening the Consul Web UI (http://153.92.208.222:8500) to check the health and status of the Consul cluster.
Now that Consul is installed and running, we need to configure Vault to use Consul as its storage backend.
In the Vault configuration (which we installed earlier):
storage "consul" {
address = "http://153.92.208.222:8500"
path = "vault/"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 1
}
ui = true
- This tells Vault to store all of its data in Consul, ensuring redundancy and coordination across nodes.
Once the Consul service is running and Vault is integrated:
- Verify Vault is correctly using Consul by running:
vault status
- Ensure that Consul shows Vault as a healthy node in the Consul Web UI.
Now that we have Consul and Vault set up, the next step is configuring pgBackRest for efficient backup and recovery of our PostgreSQL database. Since pgBackRest is a powerful, open-source backup and restore solution designed for PostgreSQL, it will be critical for our high-availability setup.
We will install pgBackRest on each of the PostgreSQL nodes to manage our backups.
On each node of the PostgreSQL cluster, we will install pgBackRest:
sudo apt-get install -y pgbackrest
On the main PostgreSQL node, we need to create a dedicated directory for storing backups. We'll use the recommended industry-standard path:
sudo mkdir -p /var/lib/pgbackrest
sudo chown postgres:postgres /var/lib/pgbackrest
sudo chmod 700 /var/lib/pgbackrest
This path is where pgBackRest will store all of our backup files.
We need to configure pgBackRest on each PostgreSQL node, specifying the backup path, repository settings, and connection details.
Here’s a sample configuration file for pgBackRest (/etc/pgbackrest.conf
):
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
[global:archive-push]
compress-level=3
[db]
pg1-path=/var/lib/postgresql/16/main
pg1-host=153.92.208.222
pg1-user=postgres
[stanza]
pg1-host-user=postgres
pg1-host-config=/etc/postgresql/16/main/postgresql.conf
pg1-host-recovery=/var/lib/postgresql/recovery.conf
repo1-path
: Specifies the path where backups will be stored.pg1-path
: Path to the PostgreSQL data directory.pg1-host
: IP address of our main node (153.92.208.222).- We set compression (
compress-level=3
) for efficient storage usage.
We can replicate this configuration to the other nodes, changing the pg1-host
to the respective IP addresses of each node in the cluster.
To ensure we have continuous backups of Write-Ahead Log (WAL) files, we need to configure PostgreSQL to archive WAL files using pgBackRest.
In /etc/postgresql/16/main/postgresql.conf
, modify or add the following parameters:
archive_mode = on
archive_command = 'pgbackrest --stanza=stanza archive-push %p'
archive_timeout = 300
This configures PostgreSQL to use pgBackRest for WAL archiving.
A stanza in pgBackRest is a named configuration section for managing backups of a PostgreSQL database. We will create a new stanza to manage our backups.
sudo -u postgres pgbackrest --stanza=stanza --log-level-console=info stanza-create
After creating the stanza, verify that the configuration is correct by checking the status:
sudo -u postgres pgbackrest --stanza=stanza --log-level-console=info check
Ensure that the output shows a healthy configuration.
Now, we will perform our first full backup using pgBackRest.
sudo -u postgres pgbackrest --stanza=stanza --type=full --log-level-console=info backup
This command will initiate a full backup of our PostgreSQL database to the previously specified path (/var/lib/pgbackrest
).
We will schedule regular backups using cron to automate the backup process.
-
Open the cron table:
crontab -e
-
Add the following entry to schedule daily full backups at midnight:
0 0 * * * sudo -u postgres pgbackrest --stanza=stanza --type=full --log-level-console=info backup
This ensures that our PostgreSQL database is backed up daily.
We can use pgBackRest's built-in tools to monitor our backups. To check the status of backups and validate that everything is running as expected, run:
sudo -u postgres pgbackrest info
This will provide details on the latest backups, their types (full, incremental), and the size of each. Now that pgBackRest is configured and running, we have a robust backup and restore solution in place, safeguarding our data.
Following all the installations and configurations, we'll configure the VPS firewall (UFW) to allow traffic on the necessary ports for the services:
- PostgreSQL: 5432
- Consul: 8500
- Vault: 8200
- Any additional ports required for Traefik (if used) or other services.
By completing these prerequisites, we establish a functioning foundation for the implementation of the PostgreSQL cluster and its supporting services.
Now, let's proceed step by step to implement the plan introduced above.
In this step, we will deploy a high-availability PostgreSQL cluster using Patroni for automatic failover and cluster management. Consul will handle the Distributed Configuration Store (DCS) and leader election for the PostgreSQL nodes. Additionally, Vault will be used to manage secrets such as database credentials, and Cert-Manager will be responsible for handling SSL certificates to secure the communication within our services.
For any production-grade database, downtime can lead to data loss, broken user experience, and loss of revenue. This is why ensuring that our PostgreSQL database remains available and reliable even during failures is essential. Patroni achieves this by managing PostgreSQL clusters and orchestrating automatic failover in case of a node failure, minimizing downtime. Consul serves as the backend for the cluster, ensuring that one and only one node assumes the leader role for write operations. This setup also supports our ephemeral design, where recovery from failures is automated and downtime is minimized.
-
Create the PostgreSQL Cluster:
- Using Coolify v4.0.0-beta.355, navigate to the "Applications" section and deploy a new PostgreSQL service.
- We will configure the PostgreSQL cluster with Patroni for high availability. Ensure that we allocate enough resources for the database cluster (based on the 8GB of RAM we have available).
-
Configure Patroni with Consul:
- Patroni will act as our PostgreSQL cluster manager. Coolify simplifies this process by allowing us to integrate Consul for cluster coordination.
- Patroni needs to be set up to use Consul as its Distributed Configuration Store (DCS). This is crucial because Consul will handle leader election, ensuring that only one node handles write operations at a time.
In this configuration, we will define the roles and behavior for Patroni, Consul, and PostgreSQL. This will be applied to each node in our cluster. To ensure security and avoid hardcoding sensitive information like passwords in the configuration file, we should reference the environment variables directly in the YAML file. By doing this, we allow Coolify or any environment management system to inject those values securely at runtime.
scope: exo_pg_cluster
namespace: /patroni/
name: postgresql_node_1
restapi:
listen: 0.0.0.0:8008
connect_address: 153.92.208.222:8008 # Replace with actual IP of the node
etcd:
host: consul.service.consul:8500 # Consul coordinates the nodes
protocol: http
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
wal_level: replica
hot_standby: "on"
max_wal_senders: 5
max_replication_slots: 5
wal_keep_segments: 8
postgresql:
listen: 0.0.0.0:5432
connect_address: 153.92.208.222:5432 # Replace with actual IP of the node
data_dir: /var/lib/postgresql/data # PostgreSQL data directory
authentication:
replication:
username: {{ POSTGRES_USERNAME }} # Reference to environment variable
password: {{ POSTGRES_PASSWORD }} # Reference to environment variable
superuser:
username: {{ POSTGRES_USERNAME }} # Reference to environment variable
password: {{ POSTGRES_PASSWORD }} # Reference to environment variable
parameters:
max_connections: 100
shared_buffers: "512MB"
wal_keep_size: 64MB
This Patroni configuration sets up the necessary parameters for PostgreSQL replication, including using Consul for cluster coordination. It also defines how failovers are handled. With this approach, we maintain security and best practices in managing sensitive data, while also ensuring that Coolify dynamically injects the required values into the runtime environment. In this YAML file, we are using {{ POSTGRES_USERNAME }}
and {{ POSTGRES_PASSWORD }}
as placeholders for environment variables. Patroni will automatically fetch these from the environment where the service is running (in this case, Coolify’s environment management). We no longer hardcode sensitive credentials, keeping the configuration secure and flexible.
In Coolify's service environment settings:
- Define Consul as the service backend by entering its service address and port (default is
8500
). - Use Coolify's environment variable section to configure Patroni’s access to Consul and PostgreSQL credentials:
- PATRONI_SCOPE:
exo_pg_cluster
- PATRONI_NAME:
postgresql_node_1
- PATRONI_ETCD_HOSTS:
consul.service.consul:8500
- POSTGRES_USERNAME:
saExoDBA
- POSTGRES_PASSWORD:
securepassword123
- PATRONI_SCOPE:
By leveraging Coolify’s UI and environment variable management, the integration of Patroni, PostgreSQL, and Consul is simplified. We will use Consul to ensure that leader elections are consistently handled without manual intervention, contributing to the ephemeral nature of our infrastructure.
Given that our current system has 8GB of RAM, we will carefully configure PostgreSQL to balance between memory consumption and performance. We can start with 512MB of shared buffers (as shown in the Patroni configuration above) to limit the memory usage, and max_connections set to 100 to prevent overloading the system.
If needed, we can scale up the RAM based on observed performance metrics from Prometheus (introduced later in the monitoring steps) or add additional nodes to our cluster to balance the load and further improve availability.
We will now focus on setting up the monitoring stack using Prometheus, Grafana, and Node Exporter to ensure that we can monitor our PostgreSQL cluster, the system metrics of the VPS, and other performance indicators. This will be integrated with Coolify's service management and Prometheus will store metrics in our existing PostgreSQL database cluster, following the ephemeral nature of our deployment plan.
- In Coolify’s UI, create a new Service and select Prometheus.
- Set the PostgreSQL database as the storage backend for Prometheus by configuring the datasource in Prometheus' configuration:
datasource:
postgresql:
host: 153.92.208.222
port: 5432
user: saExoDBA
password: <Vault-stored-password>
database: dbExocracy
sslmode: require
- Here, Prometheus will store metrics in our existing PostgreSQL cluster, which is managed by Patroni. The ephemeral nature of this setup ensures that, in case of a node failure, Prometheus can easily reconnect and store metrics in the failover node without interruptions.
- Modify the Prometheus configuration (via Coolify’s config management) to scrape PostgreSQL metrics. We will use pg_stat_statements and node_exporter to collect the relevant metrics:
scrape_configs:
- job_name: 'postgresql'
static_configs:
- targets: ['153.92.208.222:9187']
metrics_path: '/metrics'
scheme: 'https'
basic_auth:
username: saExoDBA
password: <Vault-stored-password>
- The
pg_stat_statements
extension will provide detailed query statistics, and it is integrated within the database for query optimization and debugging.
- In Coolify, create another Service and select Grafana.
- Once Grafana is deployed, add Prometheus as a data source:
- URL:
http://prometheus:9090
(assuming Coolify routes between internal services correctly). - Set up dashboards for PostgreSQL using prebuilt Grafana dashboards, such as the PostgreSQL dashboard which includes detailed performance metrics.
- URL:
- Configure Grafana Loki for logging and integrate it into Grafana's data sources to centralize logs for better visibility.
- In Coolify, create another Service and select Node Exporter.
- This will expose system-level metrics like CPU, memory, disk usage, and network performance from the VPS.
- Configure Prometheus to scrape Node Exporter data:
scrape_configs:
- job_name: 'node_exporter'
static_configs:
- targets: ['153.92.208.222:9100']
- This allows us to monitor the health of the VPS itself, providing detailed system metrics, which is crucial for performance monitoring and resource allocation. We will be using this data to evaluate our current RAM usage and assess the need for future upgrades.
- Prometheus can potentially create a large number of database connections, especially when scraping frequently. We will mitigate this by using pgBouncer as a connection pooler.
- Modify the Prometheus configuration to use pgBouncer as the intermediary to the PostgreSQL cluster.
Example config to direct Prometheus queries through pgBouncer:
datasource:
postgresql:
host: 153.92.208.222
port: 6432 # pgBouncer's port
user: saExoDBA
password: <Vault-stored-password>
database: dbExocracy
sslmode: require
- pgBouncer will efficiently manage and reduce the number of open connections, preventing any database performance degradation caused by high connection counts.
In this step, we deployed and configured the core monitoring tools:
- Prometheus collects PostgreSQL and system metrics and stores them in our existing PostgreSQL cluster, leveraging an ephemeral configuration for resilience.
- Grafana is used to visualize metrics from Prometheus and logs from Loki, ensuring that we have clear insights into the performance and health of our services.
- Node Exporter collects system metrics from the VPS to monitor its resource consumption, which will help guide us when planning for potential upgrades.
Next, we will proceed to configure OAuth2 authentication for secure access, followed by secrets management with Vault, and finally, complete the system with SSL certificates handled by Cert-Manager.
This step now ties together the Prometheus, Grafana, and Node Exporter deployments under Coolify while ensuring that the PostgreSQL cluster is efficiently monitored and Prometheus data is securely stored.
Here is your content in proper markdown format:
Vault Configuration with Consul Storage Backend
The following shows the contents of the
/etc/vault.d/vault.hcl
file before configuring Consul.What Changes When You Install Consul
When you install Consul and configure it to work with Vault, you will need to modify the
/etc/vault.d/vault.hcl
file. Consul will replace the file storage backend with the Consul storage backend, and the Vault server will use Consul for its storage. You'll also configure Vault to communicate with Consul through the listener and storage settings.Here is what the changes will look like after integrating Consul:
Updated
/etc/vault.d/vault.hcl
with Consul Storage BackendKey Changes
Storage Backend:
The storage backend is switched from
file
(local disk) toconsul
. Vault will now store its data in Consul at the specifiedpath
in Consul (vault
in this case).Consul Integration:
Vault now uses Consul as its storage backend. It is essential to ensure Consul is running before starting Vault to avoid any issues with storage.
Listener Configuration:
Vault is configured to use HTTPS with the certificates specified (
tls.crt
andtls.key
) for secure communication.This setup will enable Vault to use Consul for storage, providing high availability, replication, and a more resilient storage backend.