How the traffic that get to a host is outside the scope of this.
docker service create --name testswarm --replicas 1 --publish 8080:80 nginx /bin/bash -c "hostname > /usr/share/nginx/html/hostname; nginx -g \"daemon off;\""
Here port 8080 is published and maps to port 80 inside the container. Any traffic that hit the node on port 8080 will make its way into (one of) the ngnix container on port 80.
The container Has two network interfaces
- eth0 which is the ingress interface
- eth1 which is the egress interface
All requests originating from the load balancer will come in through the ingress interface. Also note the lowered MTU on the ingress interface (as it is linked to the overlay network)
Keep in mind when using swarm, docker creates an network namespace that actually handles the load balanced ingress traffic. That is the ingress_sbox
Configuration:
- Container Ingress IP: 10.255.0.4
- Container Ingress VIP:10.255.0.2
- Gateway Bridge IP: 172.18.0.1
- Ingress Sbox Ingress IP: 10.255.0.3 (talks to Container Ingress IP)
- Ingress Sbox gateway IP: 172.18.0.2 (talks to sbox gateway network)
|vxlan
+-----------------------------------+
| | |
| +-----------+ |
| vxlan +---------------------------------+
| +-------------------+ |
| + | | |
| | | |
+-----------------------------------+ | |
| |
| |
| |
| |
172.18.0. | |
+------------------------+ +----------------------------------+ |
| | | ingress_sbox | | |
| docker_gwbridge | | + | |
| +------------------------------+172.18.0.2 10.255.0.3 | |
| | | | |
+------------------------+ | | |
| | | |
| +----------------------------------+ |
| |
| |
| |
| |
| |
| |
| +----------------------------------+ |
| | nginx 10.255.0.4 (IP) |
| | 10.255.0.2 (VIP)+----+
| | |
| | |
+---------------------------------------------+172.18.0.3 |
| |
| |
+----------------------------------+
localhost -> docker_gwbridge_172.18.0.1 -> ingress_sbox_172.18.0.2-> marked traffic ip_vs -10.255.0.2----> container_10.255.0.4
Host iptables that matter:
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER-INGRESS
-A DOCKER-INGRESS -p tcp -m tcp --dport 8080 -j DNAT --to-destination 172.18.0.2:8080
-A DOCKER-INGRESS -p tcp -m state --state RELATED,ESTABLISHED -m tcp --sport 8080 -j ACCEPT
nsenter --net=/var/run/docker/netns/ingress_sbox
-A PREROUTING -p tcp -m tcp --dport 8080 -j MARK --set-xmark 0x100/0xffffffff
-A POSTROUTING -d 10.255.0.0/16 -m ipvs --ipvs -j SNAT --to-source 10.255.0.3
IPVS Setup in ingress_sbox - This sends traffic coming in to 10.255.0.4
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 256 rr
-> 10.255.0.4:0 Masq 1 0 0
ipvsadm-save
-A -f 256 -s rr
-a -f 256 -r 10.255.0.4:0 -m -w 1
Note: 0x100 == 256 :)
Connecting to the container from the ingress_sbox (both 80 and 8080 work)
curl http://10.255.0.4:80/hostname
1f99d9cf0236
curl http://10.255.0.4:8080/hostname
1f99d9cf0236
nsenter --net=/var/run/docker/netns/02d51fa13d84
-A PREROUTING -d 10.255.0.4/32 -p tcp -m tcp --dport 8080 -j REDIRECT --to-ports 80
-A OUTPUT -d 127.0.0.11/32 -j DOCKER_OUTPUT
-A POSTROUTING -d 127.0.0.11/32 -j DOCKER_POSTROUTING
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p tcp -m tcp --dport 53 -j DNAT --to-destination 127.0.0.11:41343
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p udp -m udp --dport 53 -j DNAT --to-destination 127.0.0.11:43411
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 41343 -j SNAT --to-source :53
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 43411 -j SNAT --to-source :53
COMMIT
# Completed on Thu Feb 2 21:57:42 2017
# Generated by iptables-save v1.6.0 on Thu Feb 2 21:57:42 2017
*filter
:INPUT ACCEPT [67:5781]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [75:5765]
-A INPUT -d 10.255.0.4/32 -p tcp -m tcp --dport 80 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT
-A INPUT -d 10.255.0.4/32 -p udp -j DROP
-A INPUT -d 10.255.0.4/32 -p tcp -j DROP
-A OUTPUT -s 10.255.0.4/32 -p tcp -m tcp --sport 80 -m conntrack --ctstate ESTABLISHED -j ACCEPT
-A OUTPUT -s 10.255.0.4/32 -p udp -j DROP
-A OUTPUT -s 10.255.0.4/32 -p tcp -j DROP
Docker swarm has an internal DNS based load balancer that RRs the DNS requests to spread load. That runs on the localhost on the host bound to a host port specific to the container. https://github.com/docker/libnetwork/blob/5ac04367ae7b0b12c33bed5f5b395bd4c104fff9/sandbox.go#L815
There is a rule in the container namespace which is used to implement the docker DNS load balancer/resolver. That way 127.0.0.11:53 maps to a specific port on which the corresponding resolver is running.
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p tcp -m tcp --dport 53 -j DNAT --to-destination 127.0.0.11:41343
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p udp -m udp --dport 53 -j DNAT --to-destination 127.0.0.11:43411
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 41343 -j SNAT --to-source :53
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 43411 -j SNAT --to-source :53
The resolver is docker
netstat -plunt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.11:41343 0.0.0.0:* LISTEN 14447/dockerd
udp 0 0 127.0.0.11:43411 0.0.0.0:* 14447/dockerd
- The port forwarding was in the ingress sbox
- The DNS remap was in the network ns
ingress-sbox
# Generated by iptables-save v1.6.0 on Thu Feb 9 21:24:56 2017
*mangle
:PREROUTING ACCEPT [36:4125]
:INPUT ACCEPT [3:180]
:FORWARD ACCEPT [33:3945]
:OUTPUT ACCEPT [3:180]
:POSTROUTING ACCEPT [36:4125]
-A PREROUTING -p tcp -m tcp --dport 8080 -j MARK --set-xmark 0x100/0xffffffff
-A OUTPUT -d 10.255.0.4/32 -j MARK --set-xmark 0x100/0xffffffff
COMMIT
# Completed on Thu Feb 9 21:24:56 2017
# Generated by iptables-save v1.6.0 on Thu Feb 9 21:24:56 2017
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER_OUTPUT - [0:0]
:DOCKER_POSTROUTING - [0:0]
-A PREROUTING -p tcp -m tcp --dport 8080 -j REDIRECT --to-ports 80
-A OUTPUT -d 127.0.0.11/32 -j DOCKER_OUTPUT
-A POSTROUTING -d 127.0.0.11/32 -j DOCKER_POSTROUTING
-A POSTROUTING -d 10.255.0.0/16 -m ipvs --ipvs -j SNAT --to-source 10.255.0.3
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p tcp -m tcp --dport 53 -j DNAT --to-destination 127.0.0.11:45190
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p udp -m udp --dport 53 -j DNAT --to-destination 127.0.0.11:40332
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 45190 -j SNAT --to-source :53
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 40332 -j SNAT --to-source :53
COMMIT
# Completed on Thu Feb 9 21:24:56 2017
# Generated by iptables-save v1.6.0 on Thu Feb 9 21:24:56 2017
*filter
:INPUT ACCEPT [3:180]
:FORWARD ACCEPT [33:3945]
:OUTPUT ACCEPT [3:180]
COMMIT
In container
# Generated by iptables-save v1.6.0 on Thu Feb 9 21:32:37 2017
*mangle
:PREROUTING ACCEPT [21:1358]
:INPUT ACCEPT [21:1358]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [15:2767]
:POSTROUTING ACCEPT [15:2767]
COMMIT
# Completed on Thu Feb 9 21:32:37 2017
# Generated by iptables-save v1.6.0 on Thu Feb 9 21:32:37 2017
*nat
:PREROUTING ACCEPT [3:180]
:INPUT ACCEPT [3:180]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER_OUTPUT - [0:0]
:DOCKER_POSTROUTING - [0:0]
-A OUTPUT -d 127.0.0.11/32 -j DOCKER_OUTPUT
-A POSTROUTING -d 127.0.0.11/32 -j DOCKER_POSTROUTING
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p tcp -m tcp --dport 53 -j DNAT --to-destination 127.0.0.11:39295
-A DOCKER_OUTPUT -d 127.0.0.11/32 -p udp -m udp --dport 53 -j DNAT --to-destination 127.0.0.11:44854
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 39295 -j SNAT --to-source :53
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 44854 -j SNAT --to-source :53
COMMIT
# Completed on Thu Feb 9 21:32:37 2017
# Generated by iptables-save v1.6.0 on Thu Feb 9 21:32:37 2017
*filter
:INPUT ACCEPT [21:1358]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [15:2767]
COMMIT
# Completed on Thu Feb 9 21:32:37 2017
#With Clear Containers
+---------------------------------+ +--------------------------------+
| ingress sbox | | |
| + | | + |
| +-----------------------------------------------------+ |
| I IP | | | +--------------+
| +----+ | | +-----------+ |
| + | | | over|ay box | |
| | | | | |
+---------------------------------+ +--------------------------------+
| |
| |
| | host continer ns
| +--------------------------------------------+
| | +-+ | |
| | | +-----+ +-----------------+ |
| | | | | IP | |
| | | +--------------+ VIP | |
| Resolver-----+ +-+ | | |
docker_gw_bridge | 127.0.0.11 | | | |
+ | | +-+R IP | | |
+----------+ | | +---------------+ HIP | |
H GW IP +--------------------------------------+ | | | |
| | +-+ +-----------------+ |
+ default gw | /etc/resolv.conf (127..)|
+--------------------------------------------+
/etc/resolv.conf
@mcastelino Can you update the intermediate solution that you used for the dns-proxy issue?