You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nomad generally uses less CPU and RAM than K3s for the same basic cluster functionality. Nomad is widely regarded as one of the most efficient and lightweight orchestrators available.
For your 5-VPS setup, Nomad's single-binary, minimal approach will result in lower resource utilization and operational overhead compared to K3s, which, despite being "lightweight," still runs the full set of Kubernetes control plane components.
βοΈ Nomad vs. K3s Resource Footprint
The difference in resource consumption stems from the architectural complexity of the two systems.
Resource
Nomad Agent (Server/Client)
K3s Agent / Server
CPU
Lower Baseline. Nomad's scheduler is very fast and efficient, leading to minimal CPU usage during idle periods and faster scheduling under load.
Higher Baseline. K3s runs the Kubernetes API server, controller manager, and schedulerβall of which require a higher CPU baseline, even in their lightweight K3s form.
RAM
Lower. Nomad's core agent footprint is smaller. Its Raft consensus is built into the agent and is generally less memory-hungry than a full-featured Kubernetes control plane.
Higher. K3s servers still require sufficient RAM (e.g., 2 GB minimum) to run the Kubernetes control plane and its datastore (defaulting to SQLite or embedded etcd).
Storage (Data)
Low. Nomad stores its state in memory and uses disk primarily for Raft snapshots and logging.
Medium. K3s, by default, uses the embedded SQLite database, which is simple but still requires constant, fast I/O on the disk, making disk performance a key factor.
Bandwidth (Gossip)
Lower and more efficient. Nomad uses the Serf gossip protocol for cluster membership, which is known for being extremely bandwidth-efficient.
Higher. K3s uses the standard Kubernetes API model for internal communication, which can generate more network traffic, especially for large clusters.
Why Nomad is Leaner:
Single Binary: Nomad is a single binary that handles everythingβscheduling, client execution, and Raft consensus. This eliminates the overhead of inter-process communication that Kubernetes components inherently require.
Focus on Scheduling: Nomad's core purpose is efficiently placing jobs. It doesn't include components like Ingress controllers, CNI, or CSI drivers in its core binary (it uses plugins for that), keeping the agent very minimal.
Bin Packing: Nomad's scheduling algorithm is optimized for bin packingβcramming as many jobs as possible onto a nodeβto maximize infrastructure utilization, not just minimize its own overhead.
For your small cluster of 5 VPS, the resource difference will be noticeable, with Nomad requiring less total resources to keep the orchestration layer running, leaving more resources for your Rust containers.
Estimating daily usage for Nomad on a 5-VPS cluster requires breaking down the resource consumption of the orchestration layer itself (Nomad agents) and the application workloads.
Here's an estimate of the resources consumed by the Nomad orchestration layer running on your five VPS:
π Estimated Resource Usage for Nomad Agents (Orchestration Layer Only)
Nomad is highly efficient. The vast majority of resource usage will come from your applications (your Rust binary), not the orchestrator.
1. CPU Usage (Per Agent)
Idle Servers (3 VPS): 50 MHz to 200 MHz (0.05 to 0.2 CPU cores). The CPU is mainly consumed by the Raft consensus and the scheduler loop. This scales slightly with the number of jobs being scheduled.
Idle Clients (2 VPS): <50 MHz (less than 0.05 CPU cores). The client is mostly waiting for instructions and periodically sending heartbeats.
2. RAM Usage (Per Agent)
Servers (3 VPS): 256 MB to 512 MB. RAM is the most critical resource for servers, as they hold the entire cluster state (Raft log, job specifications, allocation data) in memory. For a small 5-node cluster, 256 MB is usually sufficient.
Clients (2 VPS): 64 MB to 128 MB. Very lightweight, primarily used for tracking local allocations and communicating with Docker.
3. Network Bandwidth (Total Daily Traffic)
The bandwidth is primarily event-driven, not constant.
Baseline (Idle Traffic): <10 GB per day total across all 5 VPS. This low baseline traffic is generated by the Serf gossip protocol (membership/health checks) and client heartbeats (periodic check-ins). This is extremely minimal.
Event-Driven (Bursty Traffic): This is highly variable.
Job Deployment: Deploying a job (e.g., submitting your Rust binary) involves sending the HCL file and receiving the allocation status, which is minimal data.
Container Pulls: The single largest bandwidth event is pulling the container image (my-rust-app:latest). This happens whenever you deploy a new version or when a node is added. The usage depends entirely on the size of your binary's Docker image.
π‘ Estimating Total VPS Usage
To estimate the actual resource usage for your VPS, you need to add your application requirements.
Example Estimation for Your Rust Binary
Assume you deploy 3 replicas of your Rust service across the 5 VPS:
Resource
Nomad Orchestration Layer
Application (3 Replicas)
Total Cluster Usage (Estimate)
CPU (Peak MHz)
β 600 MHz (All 5 agents)
3 Γ 200 MHz (Example)
β 1200 MHz (1.2 cores)
RAM (Peak MB)
β 1000 MB (All 5 agents)
3 Γ 512 MB (Example)
β 2536 MB (2.5 GB)
Conclusion
Nomad's low overhead means you can confidently purchase VPS that meet your application's resource needs, knowing that the orchestrator will consume less than one full CPU core and about 1 GB of RAM across the cluster.
This is the final, consolidated guide for setting up a highly available (HA) Nomad cluster on your 5 VPS, using IPs 10.0.0.1 through 10.0.0.5, with essential security enabled.
The cluster will be composed of 3 Nomad Servers (10.0.0.1, 10.0.0.2, 10.0.0.3) and 2 Nomad Clients (10.0.0.4, 10.0.0.5).
π Step 1: Preparation (All 5 VPS)
Perform these actions on all five servers:
Install Nomad: Download the single binary and make it executable.
βοΈ Step 3: Configure and Start Server Agents (10.0.0.1 - 10.0.0.3)
Create /etc/nomad.d/config.hcl on the three Server VPS (10.0.0.1, 10.0.0.2, 10.0.0.3). Replace <GOSSIP_KEY> with the key from Step 1.
/etc/nomad.d/config.hcl (Server Agents)
datacenter="dc1"data_dir="/opt/nomad/data"bind_addr="10.0.0.1"# IMPORTANT: Change this to the specific IP of the VPSserver {
enabled=truebootstrap_expect=3
}
client {
enabled=true# All servers should also be clients to run jobsservers=["10.0.0.1:4647", "10.0.0.2:4647", "10.0.0.3:4647"]
}
# --- Security Configuration ---encrypt_key="<GOSSIP_KEY>"tls {
rpc_upgrade_mode=trueca_file="/etc/nomad.d/certs/ca.pem"cert_file="/etc/nomad.d/certs/cli.pem"key_file="/etc/nomad.d/certs/cli-key.pem"verify_server_hostname=trueverify_https=true
}
Create /etc/nomad.d/config.hcl on the two Client VPS (10.0.0.4, 10.0.0.5).
/etc/nomad.d/config.hcl (Client Agents)
datacenter="dc1"data_dir="/opt/nomad/data"bind_addr="10.0.0.4"# IMPORTANT: Change this to the specific IP of the VPSclient {
enabled=trueservers=["10.0.0.1:4647", "10.0.0.2:4647", "10.0.0.3:4647"]
# Enable the Docker driveroptions={
"docker.privileged.enabled"="true"# Required for some container setups
}
}
server {
enabled=false
}
# --- Security Configuration (Same as Servers) ---encrypt_key="<GOSSIP_KEY>"tls {
rpc_upgrade_mode=trueca_file="/etc/nomad.d/certs/ca.pem"cert_file="/etc/nomad.d/certs/cli.pem"key_file="/etc/nomad.d/certs/cli-key.pem"verify_server_hostname=trueverify_https=true
}
You should see all 5 IPs listed, with a status of ready.
π― Step 5: Run Your First Job
You can now submit a workload to your secure cluster from any machine that has the Nomad CLI and access to the cluster network.
Set the Environment Variable to target the leader's IP (e.g., 10.0.0.1).
export NOMAD_ADDR=https://10.0.0.1:4646
# You may also need to set the TLS CA path if running remotelyexport NOMAD_CACERT=/etc/nomad.d/certs/ca.pem
Run the Job (using the HCL file you created previously):
nomad job run my-web-app.hcl
You now have a secure, highly available container orchestrator that is much simpler to manage than Kubernetes!
Configure System Service
To make Nomad run reliably on your VPS and automatically restart after reboots or crashes, you need a systemd unit file.
You will create one single file on all 5 VPS that tells the operating system how to manage the Nomad agent. The configuration file (/etc/nomad.d/config.hcl) you created previously determines if it runs as a server, client, or both.
π Essential File: nomad.service
Create this file on all 5 VPS at the location: /etc/systemd/system/nomad.service.
[Unit]Description=Nomad
Documentation=https://www.nomadproject.io/docs/
# Waits for the network to be fully configured before starting NomadWants=network-online.target
After=network-online.target
# If you decide to integrate Consul later, uncomment these lines:# Wants=consul.service# After=consul.service[Service]# Execute Nomad as the 'nomad' user (recommended for security)# Note: You'll need to create this user first and adjust ownership of /opt/nomad/data# User=nomad# Group=nomad# Set the binary path and tell it to load ALL configuration files # from the /etc/nomad.d directory.ExecStart=/usr/local/bin/nomad agent -config /etc/nomad.d
# Reload the configuration without stopping the running agentExecReload=/bin/kill -HUP $MAINPID
# Keep the service running even if the main process exits, # as long as the OS thinks it's a good time to restart it.Restart=on-failure
RestartSec=10s
# Kill only the main process (Nomad handles its child processes)KillMode=process
KillSignal=SIGINT
# Increase limits for file descriptors and processes (essential for high concurrency)LimitNOFILE=65536
LimitNPROC=infinity
[Install]# This line ensures Nomad starts automatically when the system boots upWantedBy=multi-user.target
Post-Creation Steps (All 5 VPS)
After creating or updating the /etc/systemd/system/nomad.service file:
Reload systemd: You must tell systemd to load the new service definition.
sudo systemctl daemon-reload
Enable the service: This creates the necessary symlink to start Nomad at boot.
sudo systemctl enable nomad
Start the service: (If not already running, or to restart with the new config).
That's a great approach to understanding Nomad. The three common job types are differentiated by the value of the type stanza, which defines the scheduler's behavior regarding failure, scaling, and placement.
Here are the most straightforward demos and first principles for the three common job types:
π₯ 1. Service Jobs (Long-Lived Applications)
First Principles
The primary goal is High Availability (HA). The service must run forever, and if it fails, it must be restarted immediately and automatically. Scaling is manual (or triggered by HPA).
The Demo: A Highly Available Web Server (Your Rust API)
Job Feature
Explanation
type = "service"
Tells Nomad that this is a long-running application.
count = 3
Specifies Desired State: Always keep three instances running across the cluster.
restart stanza
Defines the failure tolerance (e.g., attempt 3 restarts within 30 minutes).
HCL Snippet
This job ensures three replicas of an HTTP echo server are constantly available, automatically restarting or relocating them upon failure.
The primary goal is Execution and Completion. The task should run, finish its work, and then be removed. Restarting is usually undesirable unless the task is idempotent.
The Demo: A Database Migration Script
Job Feature
Explanation
type = "batch"
Tells Nomad to run the job until it hits a terminal state (success or failure).
count = 1
Typically runs only once.
restart stanza
Often set to mode = "fail" or omitted, as a failed batch job usually requires human review.
HCL Snippet
This job runs a script once to simulate a database migration.
job"db-migration" {
type="batch"datacenters=["dc1"]
priority=100# Batch jobs often have high priority for immediate executiongroup"migration" {
count=1# Do NOT automatically restart if it fails, as a migration should be reviewedrestart { mode="fail" }
task"migrate" {
driver="docker"config {
image="my-repo/migration-tool:latest"# Command to run the migration and then exitcommand="/usr/local/bin/migrate"
}
}
}
}
π₯ 3. System Jobs (Host-Specific Agents)
First Principles
The primary goal is Guaranteed Placement. The task must run exactly once on every eligible Nomad client, regardless of resource constraints (within reason).
The Demo: A Logging Agent (e.g., Fluent Bit)
Job Feature
Explanation
type = "system"
Tells Nomad to run the job on every client and to automatically schedule it when new clients join the cluster.
count stanza
Ignored. The count is determined by the number of eligible clients.
constraint stanza
Often used to limit placement (e.g., only on nodes tagged as "Linux" or "Storage").
HCL Snippet
This job ensures a logging agent runs on every machine to collect host logs.
job"node-logger" {
type="system"# <--- IMPORTANT: This forces placement on all clientsdatacenters=["dc1"]
group"agent" {
# No 'count' stanza is neededtask"fluent-bit" {
driver="docker"config {
image="fluent/fluent-bit:latest"# Requires mounting the host log directoryvolumes=["/var/log:/var/log"]
}
# Use a constraint to target specific OS types if neededconstraint {
attribute="${attr.kernel.name}"value="linux"
}
}
}
}
You can find more detailed examples and demonstrations of how Nomad handles different workloads, including these three types, in video: From Zero to WOW! with Nomad.