MetalLB is a load-balancer implementation for bare metal Kubernetes clusters. This document compares its two main operational modes: Layer 2 and BGP, helping you choose the right one for your environment.
π¦ Cluster Setup
Node Name | Role | IP |
---|---|---|
node1 (VM1) |
worker | 192.168.1.101 |
node2 (VM2) |
worker | 192.168.1.102 |
node3 (VM3) |
worker | 192.168.1.103 |
You have a MetalLB IPAddressPool
that assigns:
192.168.1.250
Now suppose you expose a Service via MetalLB. Here's what happens in Layer 2 mode:
- MetalLB elects (e.g.)
node2
to "own"192.168.1.250
. - Only node2 replies to ARP requests for
192.168.1.250
on the network. - So traffic for that IP will be sent to
node2
, and then forwarded internally to the correct pod (even if it's onnode1
ornode3
).
If node2
goes down, MetalLB will:
- Elect a new node (say
node1
), - Send gratuitous ARP to update switches/clients,
- And continue serving traffic.
- Single Node Responsibility: Only one node "owns" the IP at any time
- Traffic Flow: All traffic goes through the elected node, which may forward it internally
- Failover: Uses ARP re-election (takes a few seconds)
- Best For: Simple setups, small clusters, or when BGP isn't an option
Client β Switch β Node (elected) β Pod (on any node)
- No true load balancing (single node handles all traffic)
- Failover time depends on ARP cache expiration
BGP (Border Gateway Protocol) mode allows all nodes to advertise the same service IPs, enabling true load balancing and faster failover.
π¦ Cluster Setup
Node Name | Role | IP |
---|---|---|
node1 (VM1) |
worker | 192.168.1.101 |
node2 (VM2) |
worker | 192.168.1.102 |
node3 (VM3) |
worker | 192.168.1.103 |
And you have a router in your network:
Device | Role | IP |
---|---|---|
router1 |
BGP speaker (e.g., MikroTik, FRR, etc.) | 192.168.1.1 |
MetalLB has a CR like this:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: bgp-pool
namespace: metallb-system
spec:
addresses:
- 192.168.1.250/32
You expose a LoadBalancer service using this pool β MetalLB picks 192.168.1.250
.
- Every node (node1, node2, node3) runs a MetalLB
speaker
pod. - Each
speaker
establishes a BGP session with the router (192.168.1.1
). - All nodes announce that they can handle traffic for
192.168.1.250
. - The router learns multiple paths to reach
192.168.1.250
β one through each node.
- Distributed Responsibility: All nodes advertise the service IPs
- Traffic Flow: Incoming traffic is load balanced across all nodes
- Failover: Near-instantaneous (sub-second) through BGP route withdrawal
- Best For: Production environments, high-availability requirements
Client β Router (ECMP) β Any Node β Pod (on any node)
- BGP-capable router (or software router like FRR/Bird)
- Network infrastructure supporting BGP
- Proper BGP peering configuration
- You need a simple setup
- You don't have BGP-capable network equipment
- Your traffic volume is low to medium
- You can tolerate brief failover times (5-10s)
- You need high availability and true load balancing
- Your network supports BGP
- You have high traffic volumes
- You need sub-second failover
Aspect | Layer 2 | BGP Mode |
---|---|---|
Node Responsibility | Single node owns the IP | All nodes can own the IP |
Traffic Distribution | Through one node (bottleneck) | Distributed across all nodes (ECMP) |
Failover Time | 5-10 seconds (ARP) | < 1 second (BGP withdrawal) |
Load Balancing | None (active-passive) | Yes (active-active) |
Network Requirements | Basic L2 switch | BGP-capable router |
Complexity | Simple to set up | Requires BGP knowledge |
Use Case | Development/Small clusters | Production/High-availability |
Performance | Limited by single node | Scales with cluster size |
Implementation | Works out-of-the-box | Requires BGP configuration |
Best For | Simple deployments | Enterprise/production environments |