In the Linux Kernel, support for networking hardware and the methods to interact with these devices is standardized by the socket API:
+----------------+
| Socket API |
+-------+--------+
|
User space |
+-------------------------------------------------+
Kernel space |
|
|
|
+---------+---------+
| Raw Ethernet |
+---------+---------+
|
|
|
+---------+---------+
| Network Stack |
+---------+---------+
|
|
+---+----+
| eth0 |
+---+----+
|
+---+----+
| NIC |
+--------+
In order to support new kind of computational workloads, different deployment scenarios and a better use of HW resources the Linux OS supports virtualization of different computing resources: CPU, memory, storage and networking. Virtual networking capabilities are indeed used as a basis for hosting VMs and containers.
A general overview of virtual networking components available in Linux is described in this article from the IBM developerworks web site.
- Bridge: A Linux bridge behaves like a network switch. It forwards packets between interfaces that are connected to it. It's usually used for forwarding packets on routers, on gateways, or between VMs and network namespaces on a host. It also supports STP, VLAN filter, and multicast snooping.
- TUN: TUN (network Tunnel) devices work at the IP level or layer three level of the network stack and are usually point-to-point connections. A typical use for a TUN device is establishing VPN connections since it gives the VPN software a chance to encrypt the data before it gets put on the wire. Since a TUN device works at layer three it can only accept IP packets and in some cases only IPv4. If you need to run any other protocol over a TUN device you’re out of luck. Additionally because TUN devices work at layer three they can’t be used in bridges and don't typically support broadcasting.
- TAP: TAP (terminal access point) devices, in contrast, work at the Ethernet level or layer two and therefore behave very much like a real network adaptor. Since they are running at layer two they can transport any layer three protocol and aren’t limited to point-to-point connections. TAP devices can be part of a bridge and are commonly used in virtualization systems to provide virtual network adaptors to multiple guest machines. Since TAP devices work at layer two they will forward broadcast traffic which normally makes them a poor choice for VPN connections as the VPN link is typically much narrower than a LAN network (and usually more expensive).
- VETH: Virtual Ethernet interfaces are essentially a virtual equivalent of a patch cable, what goes in one end comes out the other. When either device is down, the link state of the pair is down.
An example of creating a bridge:
>>> ip link add br0 type bridge
Enslaving a network interface to a bridge:
>>> ip link set eth0 master br0
An example of creating two virtual ethernet interfaces (ep1,ep2) and linking them together:
>>> ip link add ep1 type veth peer name ep2
veth interfaces can also be linked to a bridge:
>>> ip link set ep1 master br0
>>> ip link set ep2 master br0
It is also possible to add IP addresses to the interfaces, for example:
>>> ip addr add 10.0.0.10 dev ep1
>>> ip addr add 10.0.0.11 dev ep2
All the network interfaces available will be shown with: ip address show
Many other types of virtual network interfaces are available, as described in this post from the RedHat developers blog.
Namespaces are a feature available on the Linux kernel which is used as a basis for many software technology like Linux Containers (LXC), Docker and software-defined network (SDN) solutions. It basically allows to define and use multiple virtual instances of the resources available on a host.
Linux namespaces include (additional references are available in the man page):
- Cgroup
- IPC
- Mount
- PID
- User
- UTS
- Network
In particular, network namespaces allow individual containers to have exclusive access to virtual network resources, while each container can be assigned a separate network stack.
Network namespaces allows different processes to have different views of the network and different aspects of networking can be isolated between processes:
- Interfaces: different processes can connect to addresses on different interfaces.
- Routes: since processes can see different addresses from different namespaces, they also need different routes to connect to networks on those interfaces.
- Firewall rules: since these are dependant on the source or target interfaces, you may need different firewall rules in different network namespaces.
Handling of network namespaces are done with the ip command, which is part of the iproute2
package.
NOTE: all the commands in the following examples have to be executed directly by root or with root privileges (e.g. with sudo
).
Create, list and delete a network namespace:
>>> ip netns add ns1
>>> ip netns list
>>> ip netns del ns1
ns1
is a network NS which is completely separated from the default one (which is always available after every Linux boot).
Distinct network namespaces can be connected together using veth
interfaces:
>>> ip netns add ns1
>>> ip netns add ns2
>>> ip link add veth1 netns ns1 type veth peer name veth2 netns ns2
Virtual ethernet interfaces can be assigned an IP address, inside a network class
>>> ip netns exec ns1 ip addr add "10.0.0.1/24" dev veth1
>>> ip netns exec ns2 ip addr add "10.0.0.2/24" dev veth2
Once the IPs are assigned, the veth interfaces have to be brought in UP
state:
>>> ip netns exec ns1 ip link set veth1 up
>>> ip netns exec ns2 ip link set veth2 up
An example of running a ping
command between the two different namespaces through the veth
interfaces:
>>> ip netns exec ns1 ping -c 2 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.047 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.052 ms
--- 10.0.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.047/0.049/0.052/0.007 ms
>>> ip netns exec ns2 ping -c 2 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.043 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.055 ms
--- 10.0.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.043/0.049/0.055/0.006 ms
A network namespace can have its own network interface assigned to it, for example the loopback interface (which is by default always present on new network NS but in DOWN
state):
>>> ip netns exec ns1 ip link set lo up
It can also have a separated routing table (note that when the network namespace is initially set, the routing table is empty):
>>> ip netns exec ns1 ip route show
Once a network NS is created, it will shows up in multiple places:
>>> mount | grep netns
tmpfs on /run/netns type tmpfs (rw,nosuid,noexec,relatime,size=2468812k,mode=755)
nsfs on /run/netns/ns1 type nsfs (rw)
nsfs on /run/netns/ns1 type nsfs (rw)
>>> ls -l /var/run/netns
[...]
-r--r--r-- 1 root root 0 Jan 15 11:53 ns1
References:
Considering the following properties:
- network NS can have their own network routes;
- virtual ethernet interfaces comes in pairs;
- it's possible to assign a network interface to a different network NS;
it is then possible to build an example of multiple network NSs connected together through a Linux bridge and routing rules inside the same physical host. A bridge device give us the virtual equivalent of a network switch, allowing us to connect multiple interfaces (virtual or not), and have them communicate with each other.
The following is a conceptual schema:
br-Veth1 Veth1 +-------------+
+--------------------+ namespace 1 |
| +-------------+
+--------+
| |
| bridge |
| |
+--------+
| Veth2 +-------------+
+--------------------+ namespace 2 |
br-Veth2 +-------------+
- br-veth{1,2}: veth attached to the bridge
- veth{1,2}: veth part of their respective network NS
First, two network NS will be created:
>>> ip netns add ns1
>>> ip netns add ns2
Then two pairs of veth will be created:
>>> ip link add veth1 type veth peer name br-veth1
>>> ip link add veth2 type veth peer name br-veth2
Now two of the new veths will be attached to the network NS (br-veth
is just a convenient naming convention but it does not identify a veth connected to a bridge).
>>> ip link set veth1 netns ns1
>>> ip link set veth2 netns ns2
The two veth{1,2}
will be shown only in their respective networks NS:
>>> ip netns exec ns1 ip link list
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
32: veth1@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/ether 4e:8c:92:72:f5:cd brd ff:ff:ff:ff:ff:ff link-netnsid 0
Note: the veth1
is marked as DOWN
. The same goes for veth2
.
Assign the IP address 192.168.1.11 with netmask 255.255.255.0 to veth1:
>>> ip netns exec ns1 ip addr add 192.168.1.11/24 dev veth1
An IP address (of the same network class) will be assigned also to veth2
:
>>> ip netns exec ns2 ip addr add 192.168.1.12/24 dev veth2
Even when the two veth have assigned IP address they cannot communicate between each other: the reason is that there's no configured interface on the default network namespace which can send the traffic to and from the two veth interfaces.
Adding a bridge it's the only way to go further:
>>> ip link add name br1 type bridge
>>> ip link set br1 up
It can be verified that the bridge is available:
>>> ip link | grep br1
35: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
It's now the time to connect the other two veth interfaces (br-veth{1,2}
) and attach them to the bridge:
>>> ip link set br-veth1 up
>>> ip link set br-veth2 up
In order to reach the veth interfaces through the routing table of the host itself, the bridge needs an IP address:
>>> ip addr add 192.168.1.10/24 brd + dev br1
The brd
string force to set the broadcast address (192.168.1.255), specifying the +
symbol (255).
The routing table can be checked in this way:
>>> ip route
ip route
default via 10.1.1.1 dev enp5s0
10.1.1.0/24 dev enp5s0 proto kernel scope link src 10.1.1.10
192.168.1.0/24 dev br1 proto kernel scope link src 192.168.1.10
From the global network NS it's possible to reach both IP addresses (192.168.1.{11,12}) through a simple ping
.
It's also possible to reach ns2 from ns1, once the proper routing is defined:
>>> ip netns exec ns1 ip route 192.168.1.0/24 dev veth1 proto kernel scope link src 192.168.1.11
And reaching ns2 can be tested in the following way:
ip netns exec ns1 ping 192.168.1.12
If the setup will stop at this point, both the network NS will be basically isolated from the outside world: they can only ping each other (providing the internal route is configured) but cannot reach any other IP outside the 192.168.1.0/24 space.
In order to achieve this result we can use NAT (Network Address Translation) through iptables
:
>>> iptables \
-t nat \
-A POSTROUTING \
-s 192.168.1.0/24 \
-j MASQUERADE
The previous command will specify that on the nat
table we are appending (-A
) a rule to the POSTROUTING
chain for the source address specified (-s
) and the action will be MASQUERADE
.
Last but not least, the IP forwarding has to be enabled on the networking stack of the host:
>>> sysctl -w net.ipv4.ip_forward=1
A small test: send some packets to 8.8.8.8:
>>> ip netns exec ns1 ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=61 time=43 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=61 time=35 ms
[...]
I could not follow along the example described here, I think some of the commands required to set up this network are missing but I found this article with the exact same example but with step by step instruction that you can follow along https://ops.tips/blog/using-network-namespaces-and-bridge-to-isolate-servers/