MetalLB: Layer 2 vs BGP Mode Comparison

🎯 Introduction

MetalLB is a load-balancer implementation for bare metal Kubernetes clusters. This document compares its two main operational modes: Layer 2 and BGP, helping you choose the right one for your environment.

🔁 Layer 2 Mode

🏗️ Example Cluster Setup

📦 Cluster Setup

Node Name	Role	IP
`node1` (VM1)	worker	`192.168.1.101`
`node2` (VM2)	worker	`192.168.1.102`
`node3` (VM3)	worker	`192.168.1.103`

You have a MetalLB IPAddressPool that assigns:

192.168.1.250

Now suppose you expose a Service via MetalLB. Here's what happens in Layer 2 mode:

MetalLB elects (e.g.) node2 to "own" 192.168.1.250.
Only node2 replies to ARP requests for 192.168.1.250 on the network.
So traffic for that IP will be sent to node2, and then forwarded internally to the correct pod (even if it's on node1 or node3).

If node2 goes down, MetalLB will:

Elect a new node (say node1),
Send gratuitous ARP to update switches/clients,
And continue serving traffic.

📊 Layer 2 Mode: Key Characteristics

Single Node Responsibility: Only one node "owns" the IP at any time
Traffic Flow: All traffic goes through the elected node, which may forward it internally
Failover: Uses ARP re-election (takes a few seconds)
Best For: Simple setups, small clusters, or when BGP isn't an option

🔄 Layer 2 Traffic Flow

Client → Switch → Node (elected) → Pod (on any node)

⚠️ Limitations

No true load balancing (single node handles all traffic)
Failover time depends on ARP cache expiration

🌐 BGP Mode

🏗️ How It Works

BGP (Border Gateway Protocol) mode allows all nodes to advertise the same service IPs, enabling true load balancing and faster failover.

📦 Cluster Setup

Node Name	Role	IP
`node1` (VM1)	worker	`192.168.1.101`
`node2` (VM2)	worker	`192.168.1.102`
`node3` (VM3)	worker	`192.168.1.103`

And you have a router in your network:

Device	Role	IP
`router1`	BGP speaker (e.g., MikroTik, FRR, etc.)	`192.168.1.1`

🧠 IP Pool

MetalLB has a CR like this:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: bgp-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.250/32

You expose a LoadBalancer service using this pool → MetalLB picks 192.168.1.250.

🔄 What Happens in BGP Mode

Every node (node1, node2, node3) runs a MetalLB speaker pod.
Each speaker establishes a BGP session with the router (192.168.1.1).
All nodes announce that they can handle traffic for 192.168.1.250.
The router learns multiple paths to reach 192.168.1.250 — one through each node.

📊 BGP Mode: Key Characteristics

Distributed Responsibility: All nodes advertise the service IPs
Traffic Flow: Incoming traffic is load balanced across all nodes
Failover: Near-instantaneous (sub-second) through BGP route withdrawal
Best For: Production environments, high-availability requirements

🔄 BGP Traffic Flow

Client → Router (ECMP) → Any Node → Pod (on any node)

⚠️ Requirements

BGP-capable router (or software router like FRR/Bird)
Network infrastructure supporting BGP
Proper BGP peering configuration

🏆 When to Use Which Mode?

Choose Layer 2 When:

You need a simple setup
You don't have BGP-capable network equipment
Your traffic volume is low to medium
You can tolerate brief failover times (5-10s)

Choose BGP When:

You need high availability and true load balancing
Your network supports BGP
You have high traffic volumes
You need sub-second failover

🖼️ Comparison Table

Aspect	Layer 2	BGP Mode
Node Responsibility	Single node owns the IP	All nodes can own the IP
Traffic Distribution	Through one node (bottleneck)	Distributed across all nodes (ECMP)
Failover Time	5-10 seconds (ARP)	< 1 second (BGP withdrawal)
Load Balancing	None (active-passive)	Yes (active-active)
Network Requirements	Basic L2 switch	BGP-capable router
Complexity	Simple to set up	Requires BGP knowledge
Use Case	Development/Small clusters	Production/High-availability
Performance	Limited by single node	Scales with cluster size
Implementation	Works out-of-the-box	Requires BGP configuration
Best For	Simple deployments	Enterprise/production environments

MahdadGhasemian/metallb-loadbalancer-layer2-vs-bgp-mode.md