Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ivan4th/d72d08be0fc53228cd4e7c6f19926c09 to your computer and use it in GitHub Desktop.
Save ivan4th/d72d08be0fc53228cd4e7c6f19926c09 to your computer and use it in GitHub Desktop.
VPP incomplete neighbor + MAC change problem
From 01348a8d53c78112063474a2b3cd26162428d248 Mon Sep 17 00:00:00 2001
From: Ivan Shvedunov <[email protected]>
Date: Wed, 11 Aug 2021 15:53:49 +0300
Subject: [PATCH] ip-neighbor: update incomplete nbr adj entries upon MAC
change
The rewrite data from the incomplete nbr adj entries is used, for
example, for ARP requests, so it needs to be updated when the
interface MAC address changes.
Type: fix
Signed-off-by: Ivan Shvedunov <[email protected]>
Change-Id: Ifabbbc5074d25b87620bded66bddd48a6aaa59b8
---
src/vnet/adj/adj_nbr.c | 3 +++
src/vnet/ip-neighbor/ip_neighbor.c | 21 +++++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/src/vnet/adj/adj_nbr.c b/src/vnet/adj/adj_nbr.c
index 3344d6e47..41fee9095 100644
--- a/src/vnet/adj/adj_nbr.c
+++ b/src/vnet/adj/adj_nbr.c
@@ -1031,6 +1031,9 @@ format_adj_nbr_incomplete (u8* s, va_list *ap)
s = format (s, " %U",
format_vnet_sw_if_index_name,
vnm, adj->rewrite_header.sw_if_index);
+ s = format (s, " %U",
+ format_vnet_rewrite,
+ &adj->rewrite_header, sizeof (adj->rewrite_data), 0);
return (s);
}
diff --git a/src/vnet/ip-neighbor/ip_neighbor.c b/src/vnet/ip-neighbor/ip_neighbor.c
index 8637e16fd..989cb66be 100644
--- a/src/vnet/ip-neighbor/ip_neighbor.c
+++ b/src/vnet/ip-neighbor/ip_neighbor.c
@@ -386,6 +386,18 @@ ip_neighbor_mk_incomplete_walk (adj_index_t ai, void *ctx)
return (ADJ_WALK_RC_CONTINUE);
}
+static adj_walk_rc_t
+ip_neighbor_update_incomplete_walk (adj_index_t ai, void *ctx)
+{
+ ip_adjacency_t *adj;
+
+ adj = adj_get (ai);
+ if (adj->lookup_next_index == IP_LOOKUP_NEXT_ARP)
+ ip_neighbor_mk_incomplete (ai);
+
+ return (ADJ_WALK_RC_CONTINUE);
+}
+
static void
ip_neighbor_destroy (ip_neighbor_t * ipn)
{
@@ -1147,6 +1159,7 @@ ip_neighbor_ethernet_change_mac (ethernet_main_t * em,
u32 sw_if_index, uword opaque)
{
ip_neighbor_t *ipn;
+ fib_protocol_t proto;
IP_NEIGHBOR_DBG ("mac-change: %U",
format_vnet_sw_if_index_name, vnet_get_main (),
@@ -1165,6 +1178,14 @@ ip_neighbor_ethernet_change_mac (ethernet_main_t * em,
/* *INDENT-ON* */
adj_glean_update_rewrite_itf (sw_if_index);
+ /*
+ * There may be incomplete neighbor adj entries that contain the old
+ * MAC address in their rewrite headers, so need to update them
+ */
+ FOR_EACH_FIB_IP_PROTOCOL (proto)
+ {
+ adj_nbr_walk (sw_if_index, proto, ip_neighbor_update_incomplete_walk, NULL);
+ }
}
void
--
2.30.2

I've stumbled upon a problem in the neighbor adj code that handles incomplete entries. These entries are pre-created under some circumstances such as creation of a VXLAN tunnel and used to make Ethernet headers for outbound ARP requests. Problem is, when the MAC address of the interface changes, these entries are never updated.

It is possible to reproduce the problem with 2 VPP instances connected together using memif. Let's start 2 VPP instances and configure them.

On VPP 1:

DBGvpp# ip table add 500
DBGvpp# create memif socket id 1 filename /run/vpp/memif.sock
DBGvpp# create interface memif id 0 socket-id 1 master
DBGvpp# set interface state memif1/0 up
DBGvpp# set interface ip table memif1/0 500
DBGvpp# set interface ip address memif1/0 10.0.0.1/24
DBGvpp# create vxlan tunnel src 10.0.0.1 dst 10.0.0.2 vni 1 encap-vrf-id 500
vxlan_tunnel0
DBGvpp# sh adj
[@0] ipv4-glean: [src:0.0.0.0/0] memif1/0: mtu:9000 next:1 flags:[] ffffffffffff02abcd0102030806
[@1] ipv4-glean: [src:10.0.0.0/24] memif1/0: mtu:9000 next:1 flags:[] ffffffffffff02abcd0102030806
[@2] arp-ipv4: via 10.0.0.2 memif1/0

Note arp-ipv4 entry. It's there even before any ARP requests are sent due to create vxlan tunnel which causes it to be created by invoking fib_entry_track() for the other tunnel endpoint IP. It also has rewrite_header which is not shown by sh adj, yet which is used to generate the ARP packets.

Now let's change the MAC address of the memif:

DBGvpp# set interface mac address memif1/0 02:ab:cd:01:02:03

At this point, the arp-ipv4 entry is NOT updated and it contains the old mac address which is wrong.

Let's configure VPP 2 now and enable packet tracing:

DBGvpp# create memif socket id 1 filename /run/vpp/memif.sock
DBGvpp# create interface memif id 0 socket-id 1 slave
DBGvpp# set interface state memif1/0 up
DBGvpp# set interface ip address memif1/0 10.0.0.2/24
DBGvpp# trace add memif-input 10

And now let's try to run ping on VPP1:

DBGvpp# ping 10.0.0.2 source memif1/0

Statistics: 5 sent, 0 received, 100% packet loss

As we can see, the packets don't pass. sh error on VPP 1 shows ARP requests being sent:

DBGvpp# sh error
   Count                  Node                              Reason               Severity
         5              ip4-arp                       ARP requests sent            error
         1          memif1/0-output                   interface is down            error

On VPP 2:

DBGvpp# sh error
   Count                  Node                              Reason               Severity
         5             arp-reply             ARP hw addr does not match L2 frame   error
DBGvpp# sh trace
------------------- Start of thread 0 vpp_main -------------------
Packet 1

00:00:50:293029: memif-input
  memif: hw_if_index 1 next-index 4
    slot: ring 0
00:00:50:293061: ethernet-input
  ARP: 02:fe:cd:1e:1f:4b -> ff:ff:ff:ff:ff:ff
00:00:50:293087: arp-input
  request, type ethernet/IP4, address size 6/4
  02:ab:cd:01:02:03/10.0.0.1 -> 00:00:00:00:00:00/10.0.0.2
00:00:50:293098: arp-reply
  request, type ethernet/IP4, address size 6/4
  02:ab:cd:01:02:03/10.0.0.1 -> 00:00:00:00:00:00/10.0.0.2
00:00:50:293121: error-drop
  rx:memif1/0
00:00:50:293130: drop
  arp-reply: ARP hw addr does not match L2 frame src addr

Here we can see that while the ARP packet has the correct MAC address inside it (02:ab:cd:01:02:03), the Ethernet header contains the wrong old one for memif1/0 on VPP 1 (02:fe:cd:1e:1f:4b). This problem is apparently ignored by the Linux kernel, so I was not able to reproduce the lack of connectivity using tap interface, but when the peer is another VPP the problem is there.

With the patch applied, the rewrite data are visible for the incomplete entries (before ping) on VPP 1:

DBGvpp# sh adj
[@0] ipv4-glean: [src:0.0.0.0/0] memif1/0: mtu:9000 next:1 flags:[] ffffffffffff02abcd0102030806
[@1] ipv4-glean: [src:10.0.0.0/24] memif1/0: mtu:9000 next:1 flags:[] ffffffffffff02abcd0102030806
[@2] arp-ipv4: via 10.0.0.2 memif1/0 memif1/0: mtu:9000 next:1 flags:[] ffffffffffff02abcd0102030806

An ping works too (1st packet gone due to glean):

DBGvpp# ping 10.0.0.2 source memif1/0
116 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=32.0279 ms
116 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=20.0323 ms
116 bytes from 10.0.0.2: icmp_seq=4 ttl=64 time=28.0757 ms
116 bytes from 10.0.0.2: icmp_seq=5 ttl=64 time=32.0262 ms

Statistics: 5 sent, 4 received, 20% packet loss
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment