Skip to content

Instantly share code, notes, and snippets.

@jcvenegas
Last active May 12, 2020 18:25
Show Gist options
  • Save jcvenegas/7da7e7134eab13ffd4aaf5c9f37064b5 to your computer and use it in GitHub Desktop.
Save jcvenegas/7da7e7134eab13ffd4aaf5c9f37064b5 to your computer and use it in GitHub Desktop.
Errors found when hytpervisor logs are added

http://jenkins.katacontainers.io/job/kata-containers-runtime-clh-minimal-ubuntu-hypervisor-log-stability-testing/23/consoleText

Cause:

Probably long time took to boot and wrong validation if workload is running and workload is listening: I have seen that if the workload is not listening curl will hang and not work.

Possible fixes: I see curl failed, if that happend that means that kubectl wait deployment and kubectl get pods ... | grep Running where done, but I dont see why if are done the runtime logs still are at start stage. Probably was valid but because I dont get at what time curl was launched confuces me .

I think adding curl smaller timeout but more retry could be better

# Create deployment
	kubectl create -f "${pod_config_dir}/deployment-expose-ip.yaml"

	# Check deployment creation
	cmd="kubectl wait --for=condition=Available deployment/${deployment}"
	waitForProcess "$wait_time" "$sleep_time" "$cmd"

	# Check pods are running
	cmd="kubectl get pods -o jsonpath='{.items[*].status.phase}' | grep Running"
	waitForProcess "$wait_time" "$sleep_time" "$cmd"

	# Expose deployment
	kubectl expose deployment/${deployment} --type=LoadBalancer --name=${service}

	# There appears to be no easy way to formally wait for a loadbalancer service
	# to become 'ready' - there is no service.status.condition field to wait on.

	# Now obtain the local IP:port pair of the loadbalancer service and ensure
	# we can curl from it, and get the expected result
	svcip=$(kubectl get service ${service} -o=json | jq '.spec.clusterIP' | sed 's/"//g')
	svcport=$(kubectl get service ${service} -o=json | jq '.spec.ports[].port')
	# And check we can curl the expected response from that IP address
	echo_msg="hello,world"
	curl http://$svcip:$svcport/echo?msg=${echo_msg} | grep "$echo_msg"

Error:

May 12 06:09:04 ubuntu1804-azure795540 kata-runtime[101698]: time="2020-05-12T06:09:04.115552384Z" level=info msg="release sandbox" arch=amd64 command=start container=0849f7cb7c630324ypervisor[101391]: time="2020-05-12T06:09:03.535231576Z" level=debug msg="updating cpuset cgroup" container=0849f7cb7c63032411802c94177d994d21a78d44055805faf69fc32e8d3632d1 debug_console=false name=kata-agent pid=41 sandbox=b7a29ccea4881a7402cf0c9e867b94bb0a2f8c09ee8299986c10b546fa932754 source=agent
May 12 06:09:04 ubuntu1804-azure795540 kata-hypervisor[101391]: time="2020-05-12T06:09:03.543613357Z" level=debug msg="cpuset cgroup already updated" debug_console=false name=kata-agent path=/sys/fs/cgroup/cpuset/kubepods pid=41 sandbox=b7a29ccea4881a7402cf0c9e867b94bb0a2f8c09ee8299986c10b546fa932754 source=agent
May 12 06:09:04 ubuntu1804-azure795540 kata-hypervisor[101391]: time="2020-05-12T06:09:03.551170836Z" level=debug msg="cpuset cgroup already updated" debug_console=false name=kata-agent path=/sys/fs/cgroup/cpuset/kubepods/besteffort pid=41 sandbox=b7a29ccea4881a7402cf0c9e867b94bb0a2f8c09ee8299986c10b546fa932754 source=agent
May 12 06:09:04 ubuntu1804-azure795540 kata-hypervisor[101391]: time="2020-05-12T06:09:03.543227815Z" level=debug msg="new request" debug_console=false name=kata-agent pid=41 req="container_id:\"0849f7cb7c63032411802c94177d994d21a78d44055805faf69fc32e8d3632d1\" exec_id:\"0849f7cb7c63032411802c94177d994d21a78d44055805faf69fc32e8d3632d1\" len:32768 " request=/grpc.AgentService/ReadStderr sandbox=b7a29ccea4881a7402cf0c9e867b94bb0a2f8c09ee8299986c10b546fa932754 source=agent
May 12 06:09:04 ubuntu1804-azure795540 kata-hypervisor[101391]: time="2020-05-12T06:09:03.5719809Z" level=debug msg="cpuset cgroup already updated" debug_console=false name=kata-agent path=/sys/fs/cgroup/cpuset/kubepods/besteffort/pod5db5005a-870a-446d-abfd-650f1492285b pid=41 sandbox=b7a29ccea4881a7402cf0c9e867b94bb0a2f8c09ee8299986c10b546fa932754 source=agent
May 12 06:09:04 ubuntu1804-azure795540 kata-hypervisor[101391]: time="2020-05-12T06:09:03.581865672Z" level=debug msg="updating cpuset cgroup" debug_console=false name=kata-agent path=/sys/fs/cgroup/cpuset/kubepods/besteffort/pod5db5005a-870a-446d-abfd-650f1492285b/0849f7cb7c63032411802c94177d994d21a78d44055805faf69fc32e8d3632d1/cpuset.cpus pid=41 sandbox=b7a29ccea4881a7402cf0c9e867b94bb0a2f8c09ee8299986c10b546fa932754 source=agent
11802c94177d994d21a78d44055805faf69fc32e8d3632d1 name=kata-runtime pid=101698 sandbox=b7a29ccea4881a7402cf0c9e867b94bb0a2f8c09ee8299986c10b546fa932754 source=virtcontainers subsystem=sandbox
May 12 06:09:04 ubuntu1804-azure795540 kata-runtime[101698]: time="2020-05-12T06:09:04.115723485Z" level=info msg="Disconnecting Sandbox Console" arch=amd64 command=start container=0849f7cb7c63032411802c94177d994d21a78d44055805faf69fc32e8d3632d1 function=disconnect name=kata-runtime pid=101698 sandbox=b7a29ccea4881a7402cf0c9e867b94bb0a2f8c09ee8299986c10b546fa932754 source=virtcontainers subsystem=cloudHypervisor
not ok 1 Expose IP Address
# (in test file k8s-expose-ip.bats, line 51)
#   `curl http://$svcip:$svcport/echo?msg=${echo_msg} | grep "$echo_msg"' failed
# INFO: k8s configured to use runtimeclass
# deployment.apps/hello-world created
# deployment.apps/hello-world condition met
# Running
# service/my-service exposed
#   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#                                  Dload  Upload   Total   Spent    Left  Speed
# 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:07 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:11 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:13 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:14 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:15 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:16 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:17 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:18 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:20 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:21 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:22 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:23 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:24 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:25 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:26 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:27 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:28 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:29 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:30 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:31 --:--:--     0curl: (7) Failed to connect to 10.100.126.181 port 8080: Connection timed out
# service "my-service" deleted
# deployment.apps "hello-world" deleted

http://jenkins.katacontainers.io/job/kata-containers-runtime-clh-minimal-ubuntu-hypervisor-log-stability-testing/20/consoleText

Error

May 12 05:59:15 ubuntu1804-azure07d4f0 kata-runtime[10291]: time="2020-05-12T05:59:15.17582714Z" level=warning msg="failed to cleanup netns" arch=amd64 command=create container=f421ba79c11426000b2daba225c505909f0c89e42d6bcdd412a4ae758a77baa0 error="failed to get netns /var/run/netns/cnitest-23caa7b1-c02c-9a40-acab-d8d35be33e10: failed to Statfs \"/var/run/netns/cnitest-23caa7b1-c02c-9a40-acab-d8d35be33e10\": no such file or directory" name=kata-runtime path=/var/run/netns/cnitest-23caa7b1-c02c-9a40-acab-d8d35be33e10 pid=10291 source=katautils
May 12 05:59:15 ubuntu1804-azure07d4f0 kata-runtime[10291]: time="2020-05-12T05:59:15.17611894Z" level=error msg="Failed to check if grpc server is working: context deadline exceeded" arch=amd64 command=create container=f421ba79c11426000b2daba225c505909f0c89e42d6bcdd412a4ae758a77baa0 name=kata-runtime pid=10291 source=runtime
docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown.

Diagnostic:

Slow boot time, May 12 05:58:56 to May 12 05:59:15 posible cause: "HybridVsock trivial handshake failed" Error=EOF name=agent-client pid=10291 source=agent-client

Test: Check oci calls for run with true

http://jenkins.katacontainers.io/job/kata-containers-runtime-clh-minimal-ubuntu-hypervisor-log-stability-testing/18/consoleText

Cause

Contex timeout, HybridVsock trivial handshake failed" Error=EOF

May 12 05:39:10 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:10.338490241Z" level=info msg="Stop Sandbox" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad function=stopSandbox name=kata-runtime pid=6203 source=virtcontainers subsystem=cloudHypervisor
May 12 05:39:10 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:10.338633543Z" level=info msg="Stopping Cloud Hypervisor" PID=6237 arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime pid=6203 source=virtcontainers subsystem=cloudHypervisor
May 12 05:39:10 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:10.346970698Z" level=debug msg="HybridVsock trivial handshake failed" Error=EOF name=agent-client pid=6203 source=agent-client
May 12 05:39:10 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:10.349263041Z" level=debug msg="HybridVsock trivial handshake failed" Error="read unix @->/run/vc/vm/a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad/clh.sock: read: connection reset by peer" name=agent-client pid=6203 source=agent-client
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.402280735Z" level=warning msg="VM still running after waiting 3s" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime pid=6203 source=virtcontainers subsystem=cloudHypervisor
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.402403837Z" level=debug msg="removing vm sockets" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime pid=6203 source=virtcontainers subsystem=cloudHypervisor
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.402490139Z" level=info msg="cleanup vm path" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad dir=/run/vc/vm/a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad link=/run/vc/vm/a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime pid=6203 source=virtcontainers subsystem=cloudHypervisor
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.402638542Z" level=info msg="Stopping netmon" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime netmon-pid=6233 netmon-signal=killed pid=6203 source=virtcontainers subsystem=netmon
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.402748544Z" level=info msg="Detaching endpoint" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad endpoint-type=virtual name=kata-runtime pid=6203 source=virtcontainers subsystem=network
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.500239457Z" level=debug msg="Network removed" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime pid=6203 source=virtcontainers subsystem=network
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.50035106Z" level=info msg="Network namespace \"/var/run/netns/cnitest-bf38afe9-38b4-31dc-5441-8018847ca4af\" deleted" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime pid=6203 source=virtcontainers subsystem=network
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.503992327Z" level=debug msg="Deleting sandbox cgroup" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime pid=6203 sandbox=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad source=virtcontainers subsystem=sandbox
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.504272732Z" level=warning msg="sandbox cgroups path is empty" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime pid=6203 sandbox=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad source=virtcontainers subsystem=sandbox
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.504472036Z" level=info msg=cleanup arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad function=cleanup name=kata-runtime pid=6203 source=virtcontainers subsystem=cloudHypervisor
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.50467004Z" level=info msg="cleanup agent" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime path=/run/kata-containers/shared/sandboxes/a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad pid=6203 source=virtcontainers subsystem=kata_agent
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.504904244Z" level=warning msg="failed to cleanup netns" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad error="failed to get netns /var/run/netns/cnitest-bf38afe9-38b4-31dc-5441-8018847ca4af: failed to Statfs \"/var/run/netns/cnitest-bf38afe9-38b4-31dc-5441-8018847ca4af\": no such file or directory" name=kata-runtime path=/var/run/netns/cnitest-bf38afe9-38b4-31dc-5441-8018847ca4af pid=6203 source=katautils
May 12 05:39:13 ubuntu1804-azuree78b40 kata-runtime[6203]: time="2020-05-12T05:39:13.505111548Z" level=error msg="Failed to check if grpc server is working: context deadline exceeded" arch=amd64 command=create container=a3aff1ba73bb29e3b9992a2fc7615af530ff817e796a5cb040bcaaf60fbddbad name=kata-runtime pid=6203 source=runtime
docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown.

http://jenkins.katacontainers.io/job/kata-containers-runtime-clh-minimal-ubuntu-hypervisor-log-stability-testing/12/consoleFull

00:46:23.896 May 12 05:30:29 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:29.133001154Z" level=debug msg="HybridVsock trivial handshake failed" Error=EOF name=agent-client pid=8101 source=agent-client
00:46:23.896 May 12 05:30:29 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:29.138484179Z" level=debug msg="HybridVsock trivial handshake failed" Error=EOF name=agent-client pid=8101 source=agent-client
00:46:23.896 May 12 05:30:29 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:29.138944923Z" level=debug msg="HybridVsock trivial handshake failed" Error=EOF name=agent-client pid=8101 source=agent-client
00:46:33.644 May 12 05:30:38 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:38.76375577Z" level=info msg="Stop Sandbox" arch=amd64 command=create container=9582f3c920af4aef708a92830ca548c6b945640d129b62d0ec9eac4cfe5c040d function=stopSandbox name=kata-runtime pid=8101 source=virtcontainers subsystem=cloudHypervisor
00:46:33.644 May 12 05:30:38 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:38.763906184Z" level=info msg="Stopping Cloud Hypervisor" PID=8134 arch=amd64 command=create container=9582f3c920af4aef708a92830ca548c6b945640d129b62d0ec9eac4cfe5c040d name=kata-runtime pid=8101 source=virtcontainers subsystem=cloudHypervisor
00:46:33.644 May 12 05:30:38 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:38.777354872Z" level=debug msg="HybridVsock trivial handshake failed" Error=EOF name=agent-client pid=8101 source=agent-client
00:46:33.644 May 12 05:30:38 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:38.803511378Z" level=debug msg="HybridVsock trivial handshake failed" Error="read unix @->/run/vc/vm/9582f3c920af4aef708a92830ca548c6b945640d129b62d0ec9eac4cfe5c040d/clh.sock: read: connection reset by peer" name=agent-client pid=8101 source=agent-client
00:46:36.554 May 12 05:30:41 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:41.828020653Z" level=warning msg="VM still running after waiting 3s" arch=amd64 command=create container=9582f3c920af4aef708a92830ca548c6b945640d129b62d0ec9eac4cfe5c040d name=kata-runtime pid=8101 source=virtcontainers subsystem=cloudHypervisor
00:46:36.554 May 12 05:30:41 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:41.828142965Z" level=debug msg="removing vm sockets" arch=amd64 command=create container=9582f3c920af4aef708a92830ca548c6b945640d129b62d0ec9eac4cfe5c040d name=kata-runtime pid=8101 source=virtcontainers subsystem=cloudHypervisor
00:46:36.554 May 12 05:30:41 ubuntu1804-azuref15ec0 kata-runtime[8101]: time="2020-05-12T05:30:41.828222672Z" level=info msg="cleanup vm path" arch=amd64 command=create container=9582f3c920af4aef708a92830ca548c6b945640d129b62d0ec9eac4cfe5c040d dir=/run/vc/vm/9582f3c920af4aef708a92830ca548c6b945640d129b62d0ec9eac4cfe5c040d link=/run/vc/vm/9582f3c920af4aef708a92830ca548c6b945640d129b62d0ec9eac4cfe5c040d name=kata-runtime pid=8101 source=virtcontainers subsystem=cloudHypervisor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment