Skip to content

Instantly share code, notes, and snippets.

@briandant
Created October 17, 2019 00:13
Show Gist options
  • Save briandant/fe4f640b92d2f9d590cc9c1e1d94fd45 to your computer and use it in GitHub Desktop.
Save briandant/fe4f640b92d2f9d590cc9c1e1d94fd45 to your computer and use it in GitHub Desktop.
debugging session 16 oct 2019

user___97983658

What does the consul kv store show now, as the container is stopped?

The container is indeed stopped:

$ swarm ps -a | grep user___97983658
a8c520d42a7f        idev_110_2019.1:v4                   "supervisord -c /etc…"    27 hours ago        Exited (0) 2 hours ago                                                                                                                                                    prod-itential-wharfhouse-4/user___97983658

The consul kv store is indeed empty:

root at prod-itential-wharf-0 in /home/briandant
$ consul kv get -recurse networking/docker/network/v1.0/endpoint | grep 97983



What IP is assigned to this container?

root at prod-itential-wharf-0 in /home/briandant
$ swarm inspect user___97983658 | grep -i ip
            "IP": "10.128.0.7",
                        "HostIp": "",
                        "HostIp": "",
                        "HostIp": "",
                        "HostIp": "",
            "IpcMode": "shareable",
                "LOCALIP=192.168.1.157",
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
                    "IPAMConfig": null,
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "IPAMConfig": null,
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,

Okay, so there's no IP assigned. Let's restart the container (through the Django admin).

The container is indeed up:

$ swarm ps | grep user___97983658
a8c520d42a7f        idev_110_2019.1:v4                   "supervisord -c /etc…"    27 hours ago        Up 20 seconds         3443/tcp, 10050/tcp, 10.128.0.7:23491->22/tcp, 10.128.0.7:18449->3000/tcp, 10.128.0.7:22922->6161/tcp, 10.128.0.7:21090->8181/tcp          prod-itential-wharfhouse-4/user___97983658

What's the IP?

$ swarm inspect user___97983658 | grep -i ip
            "IP": "10.128.0.7",
                        "HostIp": "",
                        "HostIp": "",
                        "HostIp": "",
                        "HostIp": "",
            "IpcMode": "shareable",
                "LOCALIP=192.168.1.157",
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
                        "HostIp": "10.128.0.7",
                        "HostIp": "10.128.0.7",
                        "HostIp": "10.128.0.7",
                        "HostIp": "10.128.0.7",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.6",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
                    "IPAMConfig": null,
                    "IPAddress": "172.17.0.6",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "IPAMConfig": null,
                    "IPAddress": "172.24.0.6",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,

Nothing jumps out as odd.

  1. HostIp is now populated with the internal IP of the node

Let's confirm that ping still fails. Welp, the container is fixed! It can ping all:

root at prod-itential-wharf-0 in /home/briandant
$ ./pingone.sh user___97983658
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
a8c520d42a7f: prod-itential-wharfhouse-4/user___97983658
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PING ansible-provisioner (172.24.0.56) 56(84) bytes of data.
64 bytes from ansible-provisioner.wharf (172.24.0.56): icmp_seq=1 ttl=64 time=0.384 ms
64 bytes from ansible-provisioner.wharf (172.24.0.56): icmp_seq=2 ttl=64 time=0.381 ms
64 bytes from ansible-provisioner.wharf (172.24.0.56): icmp_seq=3 ttl=64 time=0.292 ms
64 bytes from ansible-provisioner.wharf (172.24.0.56): icmp_seq=4 ttl=64 time=0.385 ms

--- ansible-provisioner ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3077ms
rtt min/avg/max/mdev = 0.292/0.360/0.385/0.043 ms
PING ansible_container (172.24.0.77) 56(84) bytes of data.
64 bytes from ansible_container.wharf (172.24.0.77): icmp_seq=1 ttl=64 time=0.581 ms
64 bytes from ansible_container.wharf (172.24.0.77): icmp_seq=2 ttl=64 time=0.372 ms
64 bytes from ansible_container.wharf (172.24.0.77): icmp_seq=3 ttl=64 time=0.335 ms
64 bytes from ansible_container.wharf (172.24.0.77): icmp_seq=4 ttl=64 time=0.319 ms

--- ansible_container ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3042ms
rtt min/avg/max/mdev = 0.319/0.401/0.581/0.108 ms
PING mongo_auth (172.24.0.78) 56(84) bytes of data.
64 bytes from mongo_auth.wharf (172.24.0.78): icmp_seq=1 ttl=64 time=0.469 ms
64 bytes from mongo_auth.wharf (172.24.0.78): icmp_seq=2 ttl=64 time=0.332 ms
64 bytes from mongo_auth.wharf (172.24.0.78): icmp_seq=3 ttl=64 time=0.286 ms
64 bytes from mongo_auth.wharf (172.24.0.78): icmp_seq=4 ttl=64 time=0.369 ms

--- mongo_auth ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3074ms
rtt min/avg/max/mdev = 0.286/0.364/0.469/0.067 ms
PING itential_academy_ldap_persist (172.24.0.79) 56(84) bytes of data.
64 bytes from itential_academy_ldap_persist.wharf (172.24.0.79): icmp_seq=1 ttl=64 time=0.627 ms
64 bytes from itential_academy_ldap_persist.wharf (172.24.0.79): icmp_seq=2 ttl=64 time=0.378 ms
64 bytes from itential_academy_ldap_persist.wharf (172.24.0.79): icmp_seq=3 ttl=64 time=0.282 ms
64 bytes from itential_academy_ldap_persist.wharf (172.24.0.79): icmp_seq=4 ttl=64 time=0.334 ms

--- itential_academy_ldap_persist ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3027ms
rtt min/avg/max/mdev = 0.282/0.405/0.627/0.133 ms
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Well, let's then track what's in the consul kv store:

root at prod-itential-wharf-0 in /home/briandant
$ consul kv get -recurse networking/docker/network/v1.0/endpoint | grep 97983


networking/docker/network/v1.0/endpoint/0246e177aa207334091a83021112962e7cc576856d5c4a1af4f8d91cd49daaa3/f6602af440d47b8cc1b24e28510b0718f02ddc8f3f30530c10bf78c40a94eba6/:
{
    "anonymous": false,
    "disableResolution": false,
    "ep_iface": {
        "addr": "172.24.0.6/16",
        "dstPrefix": "eth",
        "mac": "02:42:ac:18:00:06",
        "routes": null,
        "srcName": "veth44e8b95",
        "v4PoolID": "GlobalDefault/172.24.0.0/16",
        "v6PoolID": ""
    },
    "exposed_ports": null,
    "generic": {},
    "id": "f6602af440d47b8cc1b24e28510b0718f02ddc8f3f30530c10bf78c40a94eba6",
    "ingressPorts": null,
    "joinInfo": {
        "StaticRoutes": null,
        "disableGatewayService": false
    },
    "loadBalancer": false,
    "locator": "10.128.0.7",
    "myAliases": [
        "user-97983658",
        "a8c520d42a7f"
    ],
    "name": "user___97983658",
    "sandbox": "489a7747ef04e481ba220d5b19fde8215dffcc428cd1c5e45392a610f307d3b7",
    "svcAliases": null,
    "svcID": "",
    "svcName": "",
    "virtualIP": "<nil>"
}

Note: exposed_ports is null, but that shouldn't matter; many containers are in this state (but why?!) and still work (can be pinged).

Well, let me just compare it to a good container.

This one looks good:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
e6240953b66c: prod-itential-wharfhouse-0/user___92676977 Up 3 days
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
networking/docker/network/v1.0/endpoint/0246e177aa207334091a83021112962e7cc576856d5c4a1af4f8d91cd49daaa3/af5ff6afda1bf0814e2834ae4d932502548718ee8bc108d90e126161389b536d/:
{
    "anonymous": false,
    "disableResolution": false,
    "ep_iface": {
        "addr": "172.24.0.90/16",
        "dstPrefix": "eth",
        "mac": "02:42:ac:18:00:5a",
        "routes": null,
        "srcName": "veth3799033",
        "v4PoolID": "GlobalDefault/172.24.0.0/16",
        "v6PoolID": ""
    },
    "exposed_ports": [
        {
            "Port": 6161,
            "Proto": 6
        },
        {
            "Port": 22,
            "Proto": 6
        },
        {
            "Port": 10050,
            "Proto": 6
        },
        {
            "Port": 8181,
            "Proto": 6
        },
        {
            "Port": 3000,
            "Proto": 6
        },
        {
            "Port": 3443,
            "Proto": 6
        }
    ],
    "generic": {
        "com.docker.network.endpoint.exposedports": [
            {
                "Port": 6161,
                "Proto": 6
            },
            {
                "Port": 22,
                "Proto": 6
            },
            {
                "Port": 10050,
                "Proto": 6
            },
            {
                "Port": 8181,
                "Proto": 6
            },
            {
                "Port": 3000,
                "Proto": 6
            },
            {
                "Port": 3443,
                "Proto": 6
            }
        ],
        "com.docker.network.portmap": [
            {
                "HostIP": "",
                "HostPort": 21887,
                "HostPortEnd": 21887,
                "IP": "",
                "Port": 6161,
                "Proto": 6
            },
            {
                "HostIP": "",
                "HostPort": 29220,
                "HostPortEnd": 29220,
                "IP": "",
                "Port": 22,
                "Proto": 6
            },
            {
                "HostIP": "",
                "HostPort": 19562,
                "HostPortEnd": 19562,
                "IP": "",
                "Port": 8181,
                "Proto": 6
            },
            {
                "HostIP": "",
                "HostPort": 18906,
                "HostPortEnd": 18906,
                "IP": "",
                "Port": 3000,
                "Proto": 6
            }
        ]
    },
    "id": "af5ff6afda1bf0814e2834ae4d932502548718ee8bc108d90e126161389b536d",
    "ingressPorts": null,
    "joinInfo": {
        "StaticRoutes": null,
        "disableGatewayService": false
    },
    "loadBalancer": false,
    "locator": "10.128.0.3",
    "myAliases": [
        "user-92676977",
        "e6240953b66c"
    ],
    "name": "user___92676977",
    "sandbox": "20874dcc7f775e9a6e536abe3fd546164f7b40386fa1f33616f9f50a679f5a70",
    "svcAliases": null,
    "svcID": "",
    "svcName": "",
    "virtualIP": "<nil>"
}

Nothing seems different, other than of course the ports not being in the list.

What does the data in this container's Docker directory show? From WH4:

containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/config.v2.json

{
    "AppArmorProfile": "docker-default",
    "Args": [
        "-c",
        "/etc/supervisor/supervisord.conf"
    ],
    "Config": {
        "AttachStderr": false,
        "AttachStdin": false,
        "AttachStdout": false,
        "Cmd": [
            "supervisord",
            "-c",
            "/etc/supervisor/supervisord.conf"
        ],
        "Domainname": "",
        "Entrypoint": null,
        "Env": [
            "[email protected]",
            "AVL_PRIMARY_CONTAINER_NAME=user___97983658",
            "AVL_PRIMARY_CONTAINER_DOMAIN=user-97983658",
            "AVL_PRIMARY_CONTAINER_INTERNAL_DOMAIN=user-97983658",
            "AVL_PRIMARY_CONTAINER_EXTERNAL_DOMAIN=3000-97983658.itential-academy-labs.appsembler.com",
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "COURSE=IDEV_110",
            "LOCALIP=192.168.1.157",
            "[email protected]",
            "SEMESTER=2019.1"
        ],
        "ExposedPorts": {
            "10050/tcp": {},
            "22/tcp": {},
            "3000/tcp": {},
            "3443/tcp": {},
            "6161/tcp": {},
            "8181/tcp": {}
        },
        "Hostname": "a8c520d42a7f",
        "Image": "idev_110_2019.1:v4",
        "Labels": {
            "com.appsembler.wharf-container-type": "user",
            "com.docker.swarm.constraints": "[\"status!=deprecated\"]",
            "com.docker.swarm.id": "f24fa7078ff0ff52b86406883baeab0ec0d412b41994c922cf73a7ec47229d06",
            "org.label-schema.build-date": "20190305",
            "org.label-schema.license": "GPLv2",
            "org.label-schema.name": "CentOS Base Image",
            "org.label-schema.schema-version": "1.0",
            "org.label-schema.vendor": "CentOS"
        },
        "OnBuild": null,
        "OpenStdin": true,
        "StdinOnce": false,
        "Tty": true,
        "User": "",
        "Volumes": null,
        "WorkingDir": "/"
    },
    "ConfigReferences": null,
    "Created": "2019-10-15T18:59:59.378208198Z",
    "Driver": "devicemapper",
    "HasBeenManuallyStopped": false,
    "HasBeenStartedBefore": true,
    "HostnamePath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/hostname",
    "HostsPath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/hosts",
    "ID": "a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded",
    "Image": "sha256:a91748a2101611cbbd8cb291e558ab94ad07c192de10c2ef4c1f4fcf07623cb1",
    "LogPath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded-json.log",
    "Managed": false,
    "MountLabel": "",
    "MountPoints": {},
    "Name": "/user___97983658",
    "NetworkSettings": {
        "Bridge": "",
        "HairpinMode": false,
        "HasSwarmEndpoint": false,
        "IsAnonymousEndpoint": false,
        "LinkLocalIPv6Address": "",
        "LinkLocalIPv6PrefixLen": 0,
        "Networks": {
            "bridge": {
                "Aliases": null,
                "DriverOpts": null,
                "EndpointID": "369cccab8dd02f6f05426e62738876f966c306ab064fb6cc662e38c13ae4c6c1",
                "Gateway": "172.17.0.1",
                "GlobalIPv6Address": "",
                "GlobalIPv6PrefixLen": 0,
                "IPAMConfig": null,
                "IPAMOperational": false,
                "IPAddress": "172.17.0.6",
                "IPPrefixLen": 16,
                "IPv6Gateway": "",
                "Links": null,
                "MacAddress": "02:42:ac:11:00:06",
                "NetworkID": "11792b9c1bb13cca080eade3a4d2ed1a36846d0196141905701b1aa5fccd91be"
            },
            "wharf": {
                "Aliases": [
                    "user-97983658",
                    "a8c520d42a7f"
                ],
                "DriverOpts": null,
                "EndpointID": "f6602af440d47b8cc1b24e28510b0718f02ddc8f3f30530c10bf78c40a94eba6",
                "Gateway": "",
                "GlobalIPv6Address": "",
                "GlobalIPv6PrefixLen": 0,
                "IPAMConfig": null,
                "IPAMOperational": false,
                "IPAddress": "172.24.0.6",
                "IPPrefixLen": 16,
                "IPv6Gateway": "",
                "Links": null,
                "MacAddress": "02:42:ac:18:00:06",
                "NetworkID": "0246e177aa207334091a83021112962e7cc576856d5c4a1af4f8d91cd49daaa3"
            }
        },
        "Ports": {
            "10050/tcp": null,
            "22/tcp": [
                {
                    "HostIp": "0.0.0.0",
                    "HostPort": "23491"
                }
            ],
            "3000/tcp": [
                {
                    "HostIp": "0.0.0.0",
                    "HostPort": "18449"
                }
            ],
            "3443/tcp": null,
            "6161/tcp": [
                {
                    "HostIp": "0.0.0.0",
                    "HostPort": "22922"
                }
            ],
            "8181/tcp": [
                {
                    "HostIp": "0.0.0.0",
                    "HostPort": "21090"
                }
            ]
        },
        "SandboxID": "489a7747ef04e481ba220d5b19fde8215dffcc428cd1c5e45392a610f307d3b7",
        "SandboxKey": "/var/run/docker/netns/489a7747ef04",
        "SecondaryIPAddresses": null,
        "SecondaryIPv6Addresses": null,
        "Service": null
    },
    "NoNewPrivileges": false,
    "OS": "linux",
    "Path": "supervisord",
    "ProcessLabel": "",
    "ResolvConfPath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/resolv.conf",
    "RestartCount": 0,
    "SeccompProfile": "",
    "SecretReferences": null,
    "ShmPath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/mounts/shm",
    "State": {
        "Dead": false,
        "Error": "",
        "ExitCode": 0,
        "FinishedAt": "2019-10-16T19:00:06.694073233Z",
        "Health": null,
        "OOMKilled": false,
        "Paused": false,
        "Pid": 13569,
        "RemovalInProgress": false,
        "Restarting": false,
        "Running": true,
        "StartedAt": "2019-10-16T21:40:26.468432624Z"
    },
    "StreamConfig": {}
}

Well, the Docker config does not contain endpoints. That's seems like a bug. swarm ps reports that the container does have endpoints (and they are indeed accessible) but the Docker data in the dir does not show the endpoints. This must also be why consul does not report any endpoints.

But, that doesn't create a problem for connecting to containers. 79369858 can connect, even if it does not have any exposed ports (and those ports are actually accessible, right?).

Next test: does Consul show the same data on all nodes for this container? I'd first like to see if I can get it back into a broked state.

Stopped and restarted at 16 Oct 16:20. No luck: it still pings all provisioning containers Stopped and restarted at 16 Oct 16:21. No luck: it still pings all provisioning containers Stopped and restarted at 16 Oct 16:22. No luck: it still pings all provisioning containers

K, I can't get it to break again. Going to test the various consul stores anyway.

I used the following script to pull and then diff consul data from the various clusters. They all match.

#! /bin/sh

# Usage: 
# Get consul data from all clusters by passing in a container's name
remotes=$(cat ~/.ssh/config | grep itential-wharf | cut -d ' ' -f 2)

for remote in $remotes; do 
  echo
  echo
  echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"
  command="consul kv get -recurse networking/docker/network/v1.0/endpoint | grep $1"
  echo "remote: ${remote}" ;
  echo "command: ${command}";
  ssh $remote "consul kv get -recurse networking/docker/network/v1.0/endpoint | grep $1" > consul_data__$1__$remote$(date -Iseconds).txt
done

TODO

  • parse kern.log for the veth of the bad and good containers
  • What does the "id" in consul kv point to? "id": "f6602af440d47b8cc1b24e28510b0718f02ddc8f3f30530c10bf78c40a94eba6",
    • it's the endpoint on the network
  • what's this? "com.docker.swarm.id": "f24fa7078ff0ff52b86406883baeab0ec0d412b41994c922cf73a7ec47229d06",
  • figure out why there are several consecutive containers with exposed_ports: null
  • can we actually connect to
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment