We've discovered issues with UDP traffic going to our bootnodes via the Digital Ocean Floating IPs. According to DO "Floating IP" is:
Floating IP is an IP address that can be instantly moved from one Droplet to another Droplet in the same datacenter.
- Anchor IP - Default internal IP given to a dropplet in 10.0.0.0/8 subnets.
- Dropplet IP - Default public IP given to a dropplet on creation.
- Floating IP - Movable public IP.
Initially we've identified an issue where statusd
peers would throw following errors when trying to connect to DO bootnodes:
<-net.timeout
msg="--- (3) pongTimeout for [email protected]:30404: verifyinit -> unknown (ok)"
<-net.timeout
msg="--- (2) pongTimeout for [email protected]:30404: verifyinit -> unknown (ok)"
Checking with tcpdump
indicated that the bootnodes receive the UDP traffic and send back a response:
15:06:51.472000 IP 109.86.198.208.30303 > 10.18.0.37.30404: UDP, length 120
15:06:51.473628 IP 174.138.7.182.30404 > 109.86.198.208.30303: UDP, length 174
15:06:51.473934 IP 174.138.7.182.30404 > 109.86.198.208.30303: UDP, length 132
Despite that the issue was still present.
After doing some tests with netcat
an issue was identified where some packets would not arrive at the destination when the destination was the Floating IP. When testing with two hosts using nc
utility a listening instance would be started on one:
nc -l -u -p 1234
And a sending instance on another:
nc -u ${DROPPLET_IP} 1234
And messages would be send interactively. Three behaviours were identified:
- When addressing to the Dropplet IP all packets would be received and sent back and forth fine.
- When addressing to the Floating IP only the first packet would be received.
- When addressing to the Floating IP responding back to the UDP "connection" did not work.
This behaviour can be viewed here: https://asciinema.org/a/Aw9ucl29OtVxnm6gtSrnpMJBN
NOTE: This behaviour was tested in different regions and using different OSs.
The next step was to identify the difference. The packets were examined using tcpdump
.
Dropplet IP Target
09:16:05.240305 f4:a7:39:d7:8a:7d > 8a:05:1e:bd:cb:01, ethertype IPv4 (0x0800), length 60: 167.99.219.77.35520 > 188.166.68.46.search-agent: UDP, length 2
09:16:05.791998 f4:a7:39:d7:8a:7d > 8a:05:1e:bd:cb:01, ethertype IPv4 (0x0800), length 60: 167.99.219.77.35520 > 188.166.68.46.search-agent: UDP, length 2
Floating IP Target
09:16:09.904829 00:00:5e:00:01:6e > 8a:05:1e:bd:cb:01, ethertype IPv4 (0x0800), length 60: 167.99.219.77.53014 > 10.18.0.26.search-agent: UDP, length 2
09:16:10.418658 00:00:5e:00:01:6e > 8a:05:1e:bd:cb:01, ethertype IPv4 (0x0800), length 60: 167.99.219.77.53014 > 10.18.0.26.search-agent: UDP, length 2
The only three differences(apart from checksums) appear to be:
- The destination IP
- Dropplet IP used when sending to Dropplet IP
- Anchor IP used when sending to Floating IP
- No GeoIP info for destination
- Anchor IP has no GeoIP info in the packet.
- Source Device MAC Address
- When sending to Dropplet IP packet comes from a Juniper device (
JuniperN_d7:82:7d
) - When sending to Floating IP packet comes from (probably) HP device (
IETF-VRRP-VRID_6e
)
- When sending to Dropplet IP packet comes from a Juniper device (
Using a different program from netcat
- socat
- a working "solution" was identified.
Using socat
with the fork
option appears to at least let us receive packets(although not respond):
# socat -v - udp4-listen:1234
< 2018/06/22 09:05:36.252411 length=5 from=0 to=4 # delivered
test
test
Without fork
only first packet would be received when addressed to the Floating IP.
# socat -v - udp4-listen:1234,fork
< 2018/06/22 09:05:36.252411 length=5 from=0 to=4 # delivered
test
test
< 2018/06/22 09:05:36.866158 length=5 from=5 to=9 # delivered
test
test
resp
> 2018/06/22 09:41:41.190379 length=5 from=0 to=4 # NOT delivered
resp
With fork
we would receive the following packets after the first, but we could not respond.
With a higher verbosity the difference was identified. The fork
option would have no effect when connecting via Dropplet IP:
[root@udp-test-01 ~]# socat -d -d -d - udp4-listen:1234,fork
...
2018/06/22 09:44:25 socat[10714] I setting option "fork" to 1
2018/06/22 09:44:25 socat[10714] I socket(2, 2, 17) -> 5
2018/06/22 09:44:25 socat[10714] N listening on UDP AF=2 0.0.0.0:1234
2018/06/22 09:44:28 socat[10714] N accepting UDP connection from AF=2 167.99.219.77:54879
2018/06/22 09:44:28 socat[10714] I permitting UDP connection from AF=2 167.99.219.77:54879
2018/06/22 09:44:28 socat[10714] N forked off child process 10715
2018/06/22 09:44:28 socat[10714] I close(5)
2018/06/22 09:44:28 socat[10714] I still listening
2018/06/22 09:44:28 socat[10714] I socket(2, 2, 17) -> 5
2018/06/22 09:44:28 socat[10714] N listening on UDP AF=2 0.0.0.0:1234
2018/06/22 09:44:28 socat[10715] I just born: child process 10715
2018/06/22 09:44:28 socat[10715] I resolved and opened all sock addresses
2018/06/22 09:44:28 socat[10715] N starting data transfer loop with FDs [0,1] and [5,5]
test
2018/06/22 09:44:28 socat[10715] I transferred 5 bytes from 5 to 1
test
2018/06/22 09:44:30 socat[10715] I transferred 5 bytes from 5 to 1
test
2018/06/22 09:44:30 socat[10715] I transferred 5 bytes from 5 to 1
resp
2018/06/22 09:44:33 socat[10715] I transferred 5 bytes from 0 to 5
resp
But when packets were sent via the Floating IP socat
would fork a new child process for every packet received:
2018/06/22 09:44:34 socat[10715] I transferred 5 bytes from 0 to 5
2018/06/22 09:46:52 socat[10714] N accepting UDP connection from AF=2 167.99.219.77:43743
2018/06/22 09:46:52 socat[10714] I permitting UDP connection from AF=2 167.99.219.77:43743
2018/06/22 09:46:52 socat[10714] N forked off child process 10730
2018/06/22 09:46:52 socat[10714] I close(5)
2018/06/22 09:46:52 socat[10714] I still listening
2018/06/22 09:46:52 socat[10714] I socket(2, 2, 17) -> 5
2018/06/22 09:46:52 socat[10714] N listening on UDP AF=2 0.0.0.0:1234
2018/06/22 09:46:52 socat[10730] I just born: child process 10730
2018/06/22 09:46:52 socat[10730] I resolved and opened all sock addresses
2018/06/22 09:46:52 socat[10730] N starting data transfer loop with FDs [0,1] and [5,5]
test
2018/06/22 09:46:52 socat[10730] I transferred 5 bytes from 5 to 1
2018/06/22 09:46:52 socat[10714] N accepting UDP connection from AF=2 167.99.219.77:43743
2018/06/22 09:46:52 socat[10714] I permitting UDP connection from AF=2 167.99.219.77:43743
2018/06/22 09:46:52 socat[10714] N forked off child process 10731
2018/06/22 09:46:52 socat[10714] I close(5)
2018/06/22 09:46:52 socat[10714] I still listening
2018/06/22 09:46:52 socat[10714] I socket(2, 2, 17) -> 5
2018/06/22 09:46:52 socat[10714] N listening on UDP AF=2 0.0.0.0:1234
2018/06/22 09:46:52 socat[10731] I just born: child process 10731
2018/06/22 09:46:52 socat[10731] I resolved and opened all sock addresses
2018/06/22 09:46:52 socat[10731] N starting data transfer loop with FDs [0,1] and [5,5]
test
2018/06/22 09:46:52 socat[10731] I transferred 5 bytes from 5 to 1
The cause for this difference can be seen in the code here: xio-listen.c#L279 This does allow for receiving all packets but does not solve the issue of not being able to respond to them.