Skip to content

Instantly share code, notes, and snippets.

@damitkwr
Created August 29, 2019 17:51
Show Gist options
  • Save damitkwr/d4bde9d94eb4b85bfa4ec90012e8fce6 to your computer and use it in GitHub Desktop.
Save damitkwr/d4bde9d94eb4b85bfa4ec90012e8fce6 to your computer and use it in GitHub Desktop.
Name: nvidia-sp-nvidia-sp-8888366-7984b654f4-nnr8x
Namespace: seldon
Priority: 0
PriorityClassName: <none>
Node: gke-temp-cluster-gpu-pool-bf4c11e7-k6mr/10.142.0.77
Start Time: Thu, 29 Aug 2019 17:46:46 +0000
Labels: app=nvidia-sp-nvidia-sp-8888366
fluentd=true
pod-template-hash=7984b654f4
seldon-app=nvidia-sp-nvidia-sp-nvidia-sp
seldon-app-sp-predictor-istio=seldon-5962968fa362e710c7798d69cd566fa6
seldon-deployment-id=nvidia-sp-nvidia-sp
version=v1
Annotations: prometheus.io/path: prometheus
prometheus.io/port: 8000
prometheus.io/scrape: true
seldon.io/headless-svc: false
sidecar.istio.io/status:
{"version":"761ebc5a63976754715f22fcf548f05270fb4b8db07324894aebdb31fa81d960","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status: Running
IP: 10.24.1.3
Controlled By: ReplicaSet/nvidia-sp-nvidia-sp-8888366-7984b654f4
Init Containers:
istio-init:
Container ID: docker://91c10a8a46c51978e8e478e3dcfaea28c4cf64e4a0a4cc7f16503a37ce6c21b5
Image: docker.io/istio/proxy_init:1.2.5
Image ID: docker-pullable://istio/proxy_init@sha256:c9964a8c1c28b85cc631bbc90390eac238c90f82c8f929495d1e9f9a9135b724
Port: <none>
Host Port: <none>
Args:
-p
15001
-u
1337
-m
REDIRECT
-i
*
-x
-b
9000,2000,2001,2002,8000,5001,8082,9090
-d
15020
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 29 Aug 2019 17:46:50 +0000
Finished: Thu, 29 Aug 2019 17:46:51 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts: <none>
Containers:
sp-predictor-istio:
Container ID: docker://49d4fec1a8d30ba939abd63ed8df90221f40f0446f75d0c28af6f3da372a8c71
Image: gcr.io/gn-data-science-project02/dev-models/sp-predictor:0.3
Image ID: docker-pullable://gcr.io/gn-data-science-project02/dev-models/sp-predictor@sha256:4f70504824cc3d0463d27055f3e065469ec3129709601e6e5490ec67c07d8452
Port: 9000/TCP
Host Port: 0/TCP
State: Running
Started: Thu, 29 Aug 2019 17:49:44 +0000
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 29 Aug 2019 17:49:28 +0000
Finished: Thu, 29 Aug 2019 17:49:29 +0000
Ready: True
Restart Count: 2
Limits:
cpu: 2
Requests:
cpu: 2
Liveness: tcp-socket :http delay=60s timeout=1s period=5s #success=1 #failure=3
Readiness: tcp-socket :http delay=20s timeout=1s period=5s #success=1 #failure=3
Environment:
PREDICTIVE_UNIT_SERVICE_PORT: 9000
PREDICTIVE_UNIT_ID: sp-predictor-istio
PREDICTOR_ID: nvidia-sp
SELDON_DEPLOYMENT_ID: nvidia-sp
PREDICTIVE_UNIT_PARAMETERS: [{"name":"url","value":"localhost:2001","type":"STRING"},{"name":"model_name","value":"spelling_model","type":"STRING"},{"name":"protocol","value":"grpc","type":"STRING"}]
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwhcv (ro)
inference-server:
Container ID: docker://ae85b5ab33d8a08661f6a7c9087f04e841235cffbd00a6d66ee01826d7900af9
Image: nvcr.io/nvidia/tensorrtserver:19.07-py3
Image ID: docker-pullable://nvcr.io/nvidia/tensorrtserver@sha256:014cebe2a440d4f6f761e3c6ddb3d8d72f75275301fc13424a7613583ac8509f
Ports: 2000/TCP, 2001/TCP, 2002/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
trtserver
Args:
--model-store=gs://dev-models-dsg/search/spelling-models-256
--http-port=2000
--grpc-port=2001
State: Running
Started: Thu, 29 Aug 2019 17:49:10 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 1
nvidia.com/gpu: 1
Requests:
cpu: 1
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwhcv (ro)
seldon-container-engine:
Container ID: docker://a0b14c68bdeffccb6a443e617e47a5416107178b58c6048bf20f070d4e93434d
Image: docker.io/seldonio/engine:0.4.0
Image ID: docker-pullable://seldonio/engine@sha256:620a6631d4ec7279507aeaebbab1695496f5c87c3eb6b23ab1d10dbac54b2c30
Ports: 8000/TCP, 5001/TCP, 8082/TCP, 9090/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Thu, 29 Aug 2019 17:49:15 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 1
Liveness: http-get http://:admin/live delay=20s timeout=2s period=5s #success=1 #failure=7
Readiness: http-get http://:admin/ready delay=20s timeout=2s period=1s #success=1 #failure=1
Environment:
ENGINE_PREDICTOR: eyJuYW1lIjoibnZpZGlhLXNwIiwiZ3JhcGgiOnsibmFtZSI6InNwLXByZWRpY3Rvci1pc3RpbyIsInR5cGUiOiJNT0RFTCIsImltcGxlbWVudGF0aW9uIjoiVU5LTk9XTl9JTVBMRU1FTlRBVElPTiIsImVuZHBvaW50Ijp7InNlcnZpY2VfaG9zdCI6ImxvY2FsaG9zdCIsInNlcnZpY2VfcG9ydCI6OTAwMCwidHlwZSI6IlJFU1QifSwicGFyYW1ldGVycyI6W3sibmFtZSI6InVybCIsInZhbHVlIjoibG9jYWxob3N0OjIwMDEiLCJ0eXBlIjoiU1RSSU5HIn0seyJuYW1lIjoibW9kZWxfbmFtZSIsInZhbHVlIjoic3BlbGxpbmdfbW9kZWwiLCJ0eXBlIjoiU1RSSU5HIn0seyJuYW1lIjoicHJvdG9jb2wiLCJ2YWx1ZSI6ImdycGMiLCJ0eXBlIjoiU1RSSU5HIn1dfSwiY29tcG9uZW50U3BlY3MiOlt7Im1ldGFkYXRhIjp7ImNyZWF0aW9uVGltZXN0YW1wIjpudWxsfSwic3BlYyI6eyJjb250YWluZXJzIjpbeyJuYW1lIjoic3AtcHJlZGljdG9yLWlzdGlvIiwiaW1hZ2UiOiJnY3IuaW8vZ24tZGF0YS1zY2llbmNlLXByb2plY3QwMi9kZXYtbW9kZWxzL3NwLXByZWRpY3RvcjowLjMiLCJwb3J0cyI6W3sibmFtZSI6Imh0dHAiLCJjb250YWluZXJQb3J0Ijo5MDAwLCJwcm90b2NvbCI6IlRDUCJ9XSwiZW52IjpbeyJuYW1lIjoiUFJFRElDVElWRV9VTklUX1NFUlZJQ0VfUE9SVCIsInZhbHVlIjoiOTAwMCJ9LHsibmFtZSI6IlBSRURJQ1RJVkVfVU5JVF9JRCIsInZhbHVlIjoic3AtcHJlZGljdG9yLWlzdGlvIn0seyJuYW1lIjoiUFJFRElDVE9SX0lEIiwidmFsdWUiOiJudmlkaWEtc3AifSx7Im5hbWUiOiJTRUxET05fREVQTE9ZTUVOVF9JRCIsInZhbHVlIjoibnZpZGlhLXNwIn0seyJuYW1lIjoiUFJFRElDVElWRV9VTklUX1BBUkFNRVRFUlMiLCJ2YWx1ZSI6Ilt7XCJuYW1lXCI6XCJ1cmxcIixcInZhbHVlXCI6XCJsb2NhbGhvc3Q6MjAwMVwiLFwidHlwZVwiOlwiU1RSSU5HXCJ9LHtcIm5hbWVcIjpcIm1vZGVsX25hbWVcIixcInZhbHVlXCI6XCJzcGVsbGluZ19tb2RlbFwiLFwidHlwZVwiOlwiU1RSSU5HXCJ9LHtcIm5hbWVcIjpcInByb3RvY29sXCIsXCJ2YWx1ZVwiOlwiZ3JwY1wiLFwidHlwZVwiOlwiU1RSSU5HXCJ9XSJ9XSwicmVzb3VyY2VzIjp7ImxpbWl0cyI6eyJjcHUiOiIyIn0sInJlcXVlc3RzIjp7ImNwdSI6IjIifX0sImxpdmVuZXNzUHJvYmUiOnsidGNwU29ja2V0Ijp7InBvcnQiOiJodHRwIn0sImluaXRpYWxEZWxheVNlY29uZHMiOjYwLCJ0aW1lb3V0U2Vjb25kcyI6MSwicGVyaW9kU2Vjb25kcyI6NSwic3VjY2Vzc1RocmVzaG9sZCI6MSwiZmFpbHVyZVRocmVzaG9sZCI6M30sInJlYWRpbmVzc1Byb2JlIjp7InRjcFNvY2tldCI6eyJwb3J0IjoiaHR0cCJ9LCJpbml0aWFsRGVsYXlTZWNvbmRzIjoyMCwidGltZW91dFNlY29uZHMiOjEsInBlcmlvZFNlY29uZHMiOjUsInN1Y2Nlc3NUaHJlc2hvbGQiOjEsImZhaWx1cmVUaHJlc2hvbGQiOjN9LCJsaWZlY3ljbGUiOnsicHJlU3RvcCI6eyJleGVjIjp7ImNvbW1hbmQiOlsiL2Jpbi9zaCIsIi1jIiwiL2Jpbi9zbGVlcCAxMCJdfX19LCJ0ZXJtaW5hdGlvbk1lc3NhZ2VQYXRoIjoiL2Rldi90ZXJtaW5hdGlvbi1sb2ciLCJ0ZXJtaW5hdGlvbk1lc3NhZ2VQb2xpY3kiOiJGaWxlIiwiaW1hZ2VQdWxsUG9saWN5IjoiSWZOb3RQcmVzZW50In0seyJuYW1lIjoiaW5mZXJlbmNlLXNlcnZlciIsImltYWdlIjoibnZjci5pby9udmlkaWEvdGVuc29ycnRzZXJ2ZXI6MTkuMDctcHkzIiwiY29tbWFuZCI6WyJ0cnRzZXJ2ZXIiXSwiYXJncyI6WyItLW1vZGVsLXN0b3JlPWdzOi8vZGV2LW1vZGVscy1kc2cvc2VhcmNoL3NwZWxsaW5nLW1vZGVscy0yNTYiLCItLWh0dHAtcG9ydD0yMDAwIiwiLS1ncnBjLXBvcnQ9MjAwMSJdLCJwb3J0cyI6W3siY29udGFpbmVyUG9ydCI6MjAwMCwicHJvdG9jb2wiOiJUQ1AifSx7ImNvbnRhaW5lclBvcnQiOjIwMDEsInByb3RvY29sIjoiVENQIn0seyJjb250YWluZXJQb3J0IjoyMDAyLCJwcm90b2NvbCI6IlRDUCJ9XSwicmVzb3VyY2VzIjp7ImxpbWl0cyI6eyJjcHUiOiIxIiwibnZpZGlhLmNvbS9ncHUiOiIxIn0sInJlcXVlc3RzIjp7ImNwdSI6IjEiLCJudmlkaWEuY29tL2dwdSI6IjEifX0sInRlcm1pbmF0aW9uTWVzc2FnZVBhdGgiOiIvZGV2L3Rlcm1pbmF0aW9uLWxvZyIsInRlcm1pbmF0aW9uTWVzc2FnZVBvbGljeSI6IkZpbGUiLCJpbWFnZVB1bGxQb2xpY3kiOiJJZk5vdFByZXNlbnQiLCJzZWN1cml0eUNvbnRleHQiOnsicnVuQXNVc2VyIjoxMDAwfX1dLCJ0ZXJtaW5hdGlvbkdyYWNlUGVyaW9kU2Vjb25kcyI6MSwiaW1hZ2VQdWxsU2VjcmV0cyI6W3sibmFtZSI6Im5nYyJ9XX19XSwicmVwbGljYXMiOjEsImVuZ2luZVJlc291cmNlcyI6e30sImxhYmVscyI6eyJmbHVlbnRkIjoidHJ1ZSIsInZlcnNpb24iOiJ2MSJ9LCJzdmNPcmNoU3BlYyI6eyJyZXNvdXJjZXMiOnsicmVxdWVzdHMiOnsiY3B1IjoiMSJ9fX0sImV4cGxhaW5lciI6eyJjb250YWluZXJTcGVjIjp7Im5hbWUiOiIiLCJyZXNvdXJjZXMiOnt9fX19
DEPLOYMENT_NAME: nvidia-sp
DEPLOYMENT_NAMESPACE: seldon
ENGINE_SERVER_PORT: 8000
ENGINE_SERVER_GRPC_PORT: 5001
JAVA_OPTS: -server -Dcom.sun.management.jmxremote.rmi.port=9090 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9090 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=127.0.0.1
SELDON_LOG_MESSAGES_EXTERNALLY: false
Mounts:
/etc/podinfo from podinfo (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwhcv (ro)
istio-proxy:
Container ID: docker://9f54364fad2b72fc8f88413df61819bff00f12ad5914f97a34ad872e96a82260
Image: docker.io/istio/proxyv2:1.2.5
Image ID: docker-pullable://istio/proxyv2@sha256:8f210c3d09beb6b8658a4255d9ac30e25549295834a44083ed67d652ad7453e4
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
nvidia-sp-nvidia-sp-8888366.$(POD_NAMESPACE)
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:15010
--zipkinAddress
zipkin.istio-system:9411
--dnsRefreshRate
300s
--connectTimeout
10s
--proxyAdminPort
15000
--concurrency
2
--controlPlaneAuthPolicy
NONE
--statusPort
15020
--applicationPorts
9000,2000,2001,2002,8000,5001,8082,9090
State: Running
Started: Thu, 29 Aug 2019 17:49:27 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
POD_NAME: nvidia-sp-nvidia-sp-8888366-7984b654f4-nnr8x (v1:metadata.name)
POD_NAMESPACE: seldon (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
ISTIO_META_POD_NAME: nvidia-sp-nvidia-sp-8888366-7984b654f4-nnr8x (v1:metadata.name)
ISTIO_META_CONFIG_NAMESPACE: seldon (v1:metadata.namespace)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_META_INCLUDE_INBOUND_PORTS: 9000,2000,2001,2002,8000,5001,8082,9090
ISTIO_METAJSON_ANNOTATIONS: {"prometheus.io/path":"prometheus","prometheus.io/port":"8000","prometheus.io/scrape":"true","seldon.io/headless-svc":"false"}
ISTIO_METAJSON_LABELS: {"app":"nvidia-sp-nvidia-sp-8888366","fluentd":"true","pod-template-hash":"7984b654f4","seldon-app":"nvidia-sp-nvidia-sp-nvidia-sp","seldon-app-sp-predictor-istio":"seldon-5962968fa362e710c7798d69cd566fa6","seldon-deployment-id":"nvidia-sp-nvidia-sp","version":"v1"}
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwhcv (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
default-token-wwhcv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wwhcv
Optional: false
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
nvidia.com/gpu:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal TriggeredScaleUp 7m11s cluster-autoscaler pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/gn-data-science-project02/zones/us-east1-c/instanceGroups/gke-temp-cluster-gpu-pool-bf4c11e7-grp 0->1 (max: 20)}]
Warning FailedScheduling 6m30s (x4 over 7m46s) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
Warning FailedScheduling 5m (x8 over 6m5s) default-scheduler 0/2 nodes are available: 2 Insufficient nvidia.com/gpu.
Normal Scheduled 4m14s default-scheduler Successfully assigned seldon/nvidia-sp-nvidia-sp-8888366-7984b654f4-nnr8x to gke-temp-cluster-gpu-pool-bf4c11e7-k6mr
Normal Pulling 4m13s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr pulling image "docker.io/istio/proxy_init:1.2.5"
Normal Pulled 4m11s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Successfully pulled image "docker.io/istio/proxy_init:1.2.5"
Normal Created 4m10s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Created container
Normal Started 4m10s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Started container
Normal Pulling 4m8s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr pulling image "gcr.io/gn-data-science-project02/dev-models/sp-predictor:0.3"
Normal Pulled 3m49s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Successfully pulled image "gcr.io/gn-data-science-project02/dev-models/sp-predictor:0.3"
Normal Pulling 3m32s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr pulling image "nvcr.io/nvidia/tensorrtserver:19.07-py3"
Normal Pulled 2m4s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Successfully pulled image "nvcr.io/nvidia/tensorrtserver:19.07-py3"
Normal Created 111s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Created container
Normal Started 110s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Started container
Normal Pulling 110s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr pulling image "docker.io/seldonio/engine:0.4.0"
Normal Pulled 107s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Successfully pulled image "docker.io/seldonio/engine:0.4.0"
Normal Created 105s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Created container
Normal Started 105s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Started container
Normal Pulling 105s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr pulling image "docker.io/istio/proxyv2:1.2.5"
Normal Pulled 96s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Successfully pulled image "docker.io/istio/proxyv2:1.2.5"
Normal Created 93s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Created container
Normal Started 93s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Started container
Normal Started 92s (x2 over 3m32s) kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Started container
Normal Created 92s (x2 over 3m33s) kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Created container
Normal Pulled 92s kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Container image "gcr.io/gn-data-science-project02/dev-models/sp-predictor:0.3" already present on machine
Warning BackOff 89s (x2 over 90s) kubelet, gke-temp-cluster-gpu-pool-bf4c11e7-k6mr Back-off restarting failed container
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment