Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save MahdadGhasemian/c975a054e85aee85ed27de441456ba01 to your computer and use it in GitHub Desktop.
Save MahdadGhasemian/c975a054e85aee85ed27de441456ba01 to your computer and use it in GitHub Desktop.
This document provides a comparison of MetalLB's Layer 2 and BGP modes.

MetalLB: Layer 2 vs BGP Mode Comparison

🎯 Introduction

MetalLB is a load-balancer implementation for bare metal Kubernetes clusters. This document compares its two main operational modes: Layer 2 and BGP, helping you choose the right one for your environment.

πŸ” Layer 2 Mode

πŸ—οΈ Example Cluster Setup

πŸ“¦ Cluster Setup

Node Name Role IP
node1 (VM1) worker 192.168.1.101
node2 (VM2) worker 192.168.1.102
node3 (VM3) worker 192.168.1.103

You have a MetalLB IPAddressPool that assigns:

192.168.1.250

Now suppose you expose a Service via MetalLB. Here's what happens in Layer 2 mode:

  • MetalLB elects (e.g.) node2 to "own" 192.168.1.250.
  • Only node2 replies to ARP requests for 192.168.1.250 on the network.
  • So traffic for that IP will be sent to node2, and then forwarded internally to the correct pod (even if it's on node1 or node3).

If node2 goes down, MetalLB will:

  • Elect a new node (say node1),
  • Send gratuitous ARP to update switches/clients,
  • And continue serving traffic.

πŸ“Š Layer 2 Mode: Key Characteristics

  • Single Node Responsibility: Only one node "owns" the IP at any time
  • Traffic Flow: All traffic goes through the elected node, which may forward it internally
  • Failover: Uses ARP re-election (takes a few seconds)
  • Best For: Simple setups, small clusters, or when BGP isn't an option

πŸ”„ Layer 2 Traffic Flow

Client β†’ Switch β†’ Node (elected) β†’ Pod (on any node)

⚠️ Limitations

  • No true load balancing (single node handles all traffic)
  • Failover time depends on ARP cache expiration

🌐 BGP Mode

πŸ—οΈ How It Works

BGP (Border Gateway Protocol) mode allows all nodes to advertise the same service IPs, enabling true load balancing and faster failover.

πŸ“¦ Cluster Setup

Node Name Role IP
node1 (VM1) worker 192.168.1.101
node2 (VM2) worker 192.168.1.102
node3 (VM3) worker 192.168.1.103

And you have a router in your network:

Device Role IP
router1 BGP speaker (e.g., MikroTik, FRR, etc.) 192.168.1.1

🧠 IP Pool

MetalLB has a CR like this:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: bgp-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.250/32

You expose a LoadBalancer service using this pool β†’ MetalLB picks 192.168.1.250.

πŸ”„ What Happens in BGP Mode

  1. Every node (node1, node2, node3) runs a MetalLB speaker pod.
  2. Each speaker establishes a BGP session with the router (192.168.1.1).
  3. All nodes announce that they can handle traffic for 192.168.1.250.
  4. The router learns multiple paths to reach 192.168.1.250 β€” one through each node.

πŸ“Š BGP Mode: Key Characteristics

  • Distributed Responsibility: All nodes advertise the service IPs
  • Traffic Flow: Incoming traffic is load balanced across all nodes
  • Failover: Near-instantaneous (sub-second) through BGP route withdrawal
  • Best For: Production environments, high-availability requirements

πŸ”„ BGP Traffic Flow

Client β†’ Router (ECMP) β†’ Any Node β†’ Pod (on any node)

⚠️ Requirements

  • BGP-capable router (or software router like FRR/Bird)
  • Network infrastructure supporting BGP
  • Proper BGP peering configuration

πŸ† When to Use Which Mode?

Choose Layer 2 When:

  • You need a simple setup
  • You don't have BGP-capable network equipment
  • Your traffic volume is low to medium
  • You can tolerate brief failover times (5-10s)

Choose BGP When:

  • You need high availability and true load balancing
  • Your network supports BGP
  • You have high traffic volumes
  • You need sub-second failover

πŸ–ΌοΈ Comparison Table

Aspect Layer 2 BGP Mode
Node Responsibility Single node owns the IP All nodes can own the IP
Traffic Distribution Through one node (bottleneck) Distributed across all nodes (ECMP)
Failover Time 5-10 seconds (ARP) < 1 second (BGP withdrawal)
Load Balancing None (active-passive) Yes (active-active)
Network Requirements Basic L2 switch BGP-capable router
Complexity Simple to set up Requires BGP knowledge
Use Case Development/Small clusters Production/High-availability
Performance Limited by single node Scales with cluster size
Implementation Works out-of-the-box Requires BGP configuration
Best For Simple deployments Enterprise/production environments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment