The container is indeed stopped:
$ swarm ps -a | grep user___97983658
a8c520d42a7f idev_110_2019.1:v4 "supervisord -c /etc…" 27 hours ago Exited (0) 2 hours ago prod-itential-wharfhouse-4/user___97983658
The consul kv store is indeed empty:
root at prod-itential-wharf-0 in /home/briandant
$ consul kv get -recurse networking/docker/network/v1.0/endpoint | grep 97983
What IP is assigned to this container?
root at prod-itential-wharf-0 in /home/briandant
$ swarm inspect user___97983658 | grep -i ip
"IP": "10.128.0.7",
"HostIp": "",
"HostIp": "",
"HostIp": "",
"HostIp": "",
"IpcMode": "shareable",
"LOCALIP=192.168.1.157",
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"IPAMConfig": null,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAMConfig": null,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
Okay, so there's no IP assigned. Let's restart the container (through the Django admin).
The container is indeed up:
$ swarm ps | grep user___97983658
a8c520d42a7f idev_110_2019.1:v4 "supervisord -c /etc…" 27 hours ago Up 20 seconds 3443/tcp, 10050/tcp, 10.128.0.7:23491->22/tcp, 10.128.0.7:18449->3000/tcp, 10.128.0.7:22922->6161/tcp, 10.128.0.7:21090->8181/tcp prod-itential-wharfhouse-4/user___97983658
What's the IP?
$ swarm inspect user___97983658 | grep -i ip
"IP": "10.128.0.7",
"HostIp": "",
"HostIp": "",
"HostIp": "",
"HostIp": "",
"IpcMode": "shareable",
"LOCALIP=192.168.1.157",
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"HostIp": "10.128.0.7",
"HostIp": "10.128.0.7",
"HostIp": "10.128.0.7",
"HostIp": "10.128.0.7",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "172.17.0.6",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"IPAMConfig": null,
"IPAddress": "172.17.0.6",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAMConfig": null,
"IPAddress": "172.24.0.6",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
Nothing jumps out as odd.
HostIp
is now populated with the internal IP of the node
Let's confirm that ping still fails. Welp, the container is fixed! It can ping all:
root at prod-itential-wharf-0 in /home/briandant
$ ./pingone.sh user___97983658
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
a8c520d42a7f: prod-itential-wharfhouse-4/user___97983658
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PING ansible-provisioner (172.24.0.56) 56(84) bytes of data.
64 bytes from ansible-provisioner.wharf (172.24.0.56): icmp_seq=1 ttl=64 time=0.384 ms
64 bytes from ansible-provisioner.wharf (172.24.0.56): icmp_seq=2 ttl=64 time=0.381 ms
64 bytes from ansible-provisioner.wharf (172.24.0.56): icmp_seq=3 ttl=64 time=0.292 ms
64 bytes from ansible-provisioner.wharf (172.24.0.56): icmp_seq=4 ttl=64 time=0.385 ms
--- ansible-provisioner ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3077ms
rtt min/avg/max/mdev = 0.292/0.360/0.385/0.043 ms
PING ansible_container (172.24.0.77) 56(84) bytes of data.
64 bytes from ansible_container.wharf (172.24.0.77): icmp_seq=1 ttl=64 time=0.581 ms
64 bytes from ansible_container.wharf (172.24.0.77): icmp_seq=2 ttl=64 time=0.372 ms
64 bytes from ansible_container.wharf (172.24.0.77): icmp_seq=3 ttl=64 time=0.335 ms
64 bytes from ansible_container.wharf (172.24.0.77): icmp_seq=4 ttl=64 time=0.319 ms
--- ansible_container ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3042ms
rtt min/avg/max/mdev = 0.319/0.401/0.581/0.108 ms
PING mongo_auth (172.24.0.78) 56(84) bytes of data.
64 bytes from mongo_auth.wharf (172.24.0.78): icmp_seq=1 ttl=64 time=0.469 ms
64 bytes from mongo_auth.wharf (172.24.0.78): icmp_seq=2 ttl=64 time=0.332 ms
64 bytes from mongo_auth.wharf (172.24.0.78): icmp_seq=3 ttl=64 time=0.286 ms
64 bytes from mongo_auth.wharf (172.24.0.78): icmp_seq=4 ttl=64 time=0.369 ms
--- mongo_auth ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3074ms
rtt min/avg/max/mdev = 0.286/0.364/0.469/0.067 ms
PING itential_academy_ldap_persist (172.24.0.79) 56(84) bytes of data.
64 bytes from itential_academy_ldap_persist.wharf (172.24.0.79): icmp_seq=1 ttl=64 time=0.627 ms
64 bytes from itential_academy_ldap_persist.wharf (172.24.0.79): icmp_seq=2 ttl=64 time=0.378 ms
64 bytes from itential_academy_ldap_persist.wharf (172.24.0.79): icmp_seq=3 ttl=64 time=0.282 ms
64 bytes from itential_academy_ldap_persist.wharf (172.24.0.79): icmp_seq=4 ttl=64 time=0.334 ms
--- itential_academy_ldap_persist ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3027ms
rtt min/avg/max/mdev = 0.282/0.405/0.627/0.133 ms
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Well, let's then track what's in the consul kv store:
root at prod-itential-wharf-0 in /home/briandant
$ consul kv get -recurse networking/docker/network/v1.0/endpoint | grep 97983
networking/docker/network/v1.0/endpoint/0246e177aa207334091a83021112962e7cc576856d5c4a1af4f8d91cd49daaa3/f6602af440d47b8cc1b24e28510b0718f02ddc8f3f30530c10bf78c40a94eba6/:
{
"anonymous": false,
"disableResolution": false,
"ep_iface": {
"addr": "172.24.0.6/16",
"dstPrefix": "eth",
"mac": "02:42:ac:18:00:06",
"routes": null,
"srcName": "veth44e8b95",
"v4PoolID": "GlobalDefault/172.24.0.0/16",
"v6PoolID": ""
},
"exposed_ports": null,
"generic": {},
"id": "f6602af440d47b8cc1b24e28510b0718f02ddc8f3f30530c10bf78c40a94eba6",
"ingressPorts": null,
"joinInfo": {
"StaticRoutes": null,
"disableGatewayService": false
},
"loadBalancer": false,
"locator": "10.128.0.7",
"myAliases": [
"user-97983658",
"a8c520d42a7f"
],
"name": "user___97983658",
"sandbox": "489a7747ef04e481ba220d5b19fde8215dffcc428cd1c5e45392a610f307d3b7",
"svcAliases": null,
"svcID": "",
"svcName": "",
"virtualIP": "<nil>"
}
Note: exposed_ports
is null
, but that shouldn't matter; many containers are
in this state (but why?!) and still work (can be pinged).
Well, let me just compare it to a good container.
This one looks good:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
e6240953b66c: prod-itential-wharfhouse-0/user___92676977 Up 3 days
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
networking/docker/network/v1.0/endpoint/0246e177aa207334091a83021112962e7cc576856d5c4a1af4f8d91cd49daaa3/af5ff6afda1bf0814e2834ae4d932502548718ee8bc108d90e126161389b536d/:
{
"anonymous": false,
"disableResolution": false,
"ep_iface": {
"addr": "172.24.0.90/16",
"dstPrefix": "eth",
"mac": "02:42:ac:18:00:5a",
"routes": null,
"srcName": "veth3799033",
"v4PoolID": "GlobalDefault/172.24.0.0/16",
"v6PoolID": ""
},
"exposed_ports": [
{
"Port": 6161,
"Proto": 6
},
{
"Port": 22,
"Proto": 6
},
{
"Port": 10050,
"Proto": 6
},
{
"Port": 8181,
"Proto": 6
},
{
"Port": 3000,
"Proto": 6
},
{
"Port": 3443,
"Proto": 6
}
],
"generic": {
"com.docker.network.endpoint.exposedports": [
{
"Port": 6161,
"Proto": 6
},
{
"Port": 22,
"Proto": 6
},
{
"Port": 10050,
"Proto": 6
},
{
"Port": 8181,
"Proto": 6
},
{
"Port": 3000,
"Proto": 6
},
{
"Port": 3443,
"Proto": 6
}
],
"com.docker.network.portmap": [
{
"HostIP": "",
"HostPort": 21887,
"HostPortEnd": 21887,
"IP": "",
"Port": 6161,
"Proto": 6
},
{
"HostIP": "",
"HostPort": 29220,
"HostPortEnd": 29220,
"IP": "",
"Port": 22,
"Proto": 6
},
{
"HostIP": "",
"HostPort": 19562,
"HostPortEnd": 19562,
"IP": "",
"Port": 8181,
"Proto": 6
},
{
"HostIP": "",
"HostPort": 18906,
"HostPortEnd": 18906,
"IP": "",
"Port": 3000,
"Proto": 6
}
]
},
"id": "af5ff6afda1bf0814e2834ae4d932502548718ee8bc108d90e126161389b536d",
"ingressPorts": null,
"joinInfo": {
"StaticRoutes": null,
"disableGatewayService": false
},
"loadBalancer": false,
"locator": "10.128.0.3",
"myAliases": [
"user-92676977",
"e6240953b66c"
],
"name": "user___92676977",
"sandbox": "20874dcc7f775e9a6e536abe3fd546164f7b40386fa1f33616f9f50a679f5a70",
"svcAliases": null,
"svcID": "",
"svcName": "",
"virtualIP": "<nil>"
}
Nothing seems different, other than of course the ports not being in the list.
What does the data in this container's Docker directory show? From WH4:
containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/config.v2.json
{
"AppArmorProfile": "docker-default",
"Args": [
"-c",
"/etc/supervisor/supervisord.conf"
],
"Config": {
"AttachStderr": false,
"AttachStdin": false,
"AttachStdout": false,
"Cmd": [
"supervisord",
"-c",
"/etc/supervisor/supervisord.conf"
],
"Domainname": "",
"Entrypoint": null,
"Env": [
"[email protected]",
"AVL_PRIMARY_CONTAINER_NAME=user___97983658",
"AVL_PRIMARY_CONTAINER_DOMAIN=user-97983658",
"AVL_PRIMARY_CONTAINER_INTERNAL_DOMAIN=user-97983658",
"AVL_PRIMARY_CONTAINER_EXTERNAL_DOMAIN=3000-97983658.itential-academy-labs.appsembler.com",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"COURSE=IDEV_110",
"LOCALIP=192.168.1.157",
"[email protected]",
"SEMESTER=2019.1"
],
"ExposedPorts": {
"10050/tcp": {},
"22/tcp": {},
"3000/tcp": {},
"3443/tcp": {},
"6161/tcp": {},
"8181/tcp": {}
},
"Hostname": "a8c520d42a7f",
"Image": "idev_110_2019.1:v4",
"Labels": {
"com.appsembler.wharf-container-type": "user",
"com.docker.swarm.constraints": "[\"status!=deprecated\"]",
"com.docker.swarm.id": "f24fa7078ff0ff52b86406883baeab0ec0d412b41994c922cf73a7ec47229d06",
"org.label-schema.build-date": "20190305",
"org.label-schema.license": "GPLv2",
"org.label-schema.name": "CentOS Base Image",
"org.label-schema.schema-version": "1.0",
"org.label-schema.vendor": "CentOS"
},
"OnBuild": null,
"OpenStdin": true,
"StdinOnce": false,
"Tty": true,
"User": "",
"Volumes": null,
"WorkingDir": "/"
},
"ConfigReferences": null,
"Created": "2019-10-15T18:59:59.378208198Z",
"Driver": "devicemapper",
"HasBeenManuallyStopped": false,
"HasBeenStartedBefore": true,
"HostnamePath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/hostname",
"HostsPath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/hosts",
"ID": "a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded",
"Image": "sha256:a91748a2101611cbbd8cb291e558ab94ad07c192de10c2ef4c1f4fcf07623cb1",
"LogPath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded-json.log",
"Managed": false,
"MountLabel": "",
"MountPoints": {},
"Name": "/user___97983658",
"NetworkSettings": {
"Bridge": "",
"HairpinMode": false,
"HasSwarmEndpoint": false,
"IsAnonymousEndpoint": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Networks": {
"bridge": {
"Aliases": null,
"DriverOpts": null,
"EndpointID": "369cccab8dd02f6f05426e62738876f966c306ab064fb6cc662e38c13ae4c6c1",
"Gateway": "172.17.0.1",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAMConfig": null,
"IPAMOperational": false,
"IPAddress": "172.17.0.6",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"Links": null,
"MacAddress": "02:42:ac:11:00:06",
"NetworkID": "11792b9c1bb13cca080eade3a4d2ed1a36846d0196141905701b1aa5fccd91be"
},
"wharf": {
"Aliases": [
"user-97983658",
"a8c520d42a7f"
],
"DriverOpts": null,
"EndpointID": "f6602af440d47b8cc1b24e28510b0718f02ddc8f3f30530c10bf78c40a94eba6",
"Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAMConfig": null,
"IPAMOperational": false,
"IPAddress": "172.24.0.6",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"Links": null,
"MacAddress": "02:42:ac:18:00:06",
"NetworkID": "0246e177aa207334091a83021112962e7cc576856d5c4a1af4f8d91cd49daaa3"
}
},
"Ports": {
"10050/tcp": null,
"22/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "23491"
}
],
"3000/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "18449"
}
],
"3443/tcp": null,
"6161/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "22922"
}
],
"8181/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "21090"
}
]
},
"SandboxID": "489a7747ef04e481ba220d5b19fde8215dffcc428cd1c5e45392a610f307d3b7",
"SandboxKey": "/var/run/docker/netns/489a7747ef04",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"Service": null
},
"NoNewPrivileges": false,
"OS": "linux",
"Path": "supervisord",
"ProcessLabel": "",
"ResolvConfPath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/resolv.conf",
"RestartCount": 0,
"SeccompProfile": "",
"SecretReferences": null,
"ShmPath": "/var/lib/docker/689824.689824/containers/a8c520d42a7f047676e4dedd13d025de975b67fc2a6ac6c0ff858e6670f24ded/mounts/shm",
"State": {
"Dead": false,
"Error": "",
"ExitCode": 0,
"FinishedAt": "2019-10-16T19:00:06.694073233Z",
"Health": null,
"OOMKilled": false,
"Paused": false,
"Pid": 13569,
"RemovalInProgress": false,
"Restarting": false,
"Running": true,
"StartedAt": "2019-10-16T21:40:26.468432624Z"
},
"StreamConfig": {}
}
Well, the Docker config does not contain endpoints. That's seems like a bug.
swarm ps
reports that the container does have endpoints (and they are indeed
accessible) but the Docker data in the dir does not show the endpoints. This
must also be why consul does not report any endpoints.
But, that doesn't create a problem for connecting to containers. 79369858
can connect, even if
it does not have any exposed ports (and those ports are actually accessible, right?).
Next test: does Consul show the same data on all nodes for this container? I'd first like to see if I can get it back into a broked state.
Stopped and restarted at 16 Oct 16:20. No luck: it still pings all provisioning containers Stopped and restarted at 16 Oct 16:21. No luck: it still pings all provisioning containers Stopped and restarted at 16 Oct 16:22. No luck: it still pings all provisioning containers
K, I can't get it to break again. Going to test the various consul stores anyway.
I used the following script to pull and then diff consul data from the various clusters. They all match.
#! /bin/sh
# Usage:
# Get consul data from all clusters by passing in a container's name
remotes=$(cat ~/.ssh/config | grep itential-wharf | cut -d ' ' -f 2)
for remote in $remotes; do
echo
echo
echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"
command="consul kv get -recurse networking/docker/network/v1.0/endpoint | grep $1"
echo "remote: ${remote}" ;
echo "command: ${command}";
ssh $remote "consul kv get -recurse networking/docker/network/v1.0/endpoint | grep $1" > consul_data__$1__$remote$(date -Iseconds).txt
done
- parse kern.log for the veth of the bad and good containers
- What does the "id" in consul kv point to? "id": "f6602af440d47b8cc1b24e28510b0718f02ddc8f3f30530c10bf78c40a94eba6",
- it's the endpoint on the network
- what's this? "com.docker.swarm.id": "f24fa7078ff0ff52b86406883baeab0ec0d412b41994c922cf73a7ec47229d06",
- figure out why there are several consecutive containers with
exposed_ports: null
- can we actually connect to