Skip to content

Instantly share code, notes, and snippets.

@joel-bluedata
Last active January 23, 2020 20:28
Show Gist options
  • Save joel-bluedata/62ddafc66afbe8bdc9591fde53d90fe5 to your computer and use it in GitHub Desktop.
Save joel-bluedata/62ddafc66afbe8bdc9591fde53d90fe5 to your computer and use it in GitHub Desktop.
K8s GPU test
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-pod
spec:
replicas: 1
selector:
matchLabels:
run: digits-gpu-service
template:
metadata:
labels:
run: digits-gpu-service
spec:
containers:
- name: digits-container
image: nvidia/digits:6.0
resources:
limits:
nvidia.com/gpu: 1
---
apiVersion: v1
kind: Service
metadata:
name: gpu-service
labels:
hpecp.hpe.com/hpecp-internal-gateway: "true"
spec:
selector:
run: digits-gpu-service
ports:
- name: http-digits
protocol: TCP
port: 5000
targetPort: 5000
type: NodePort
The kubectl commands below assume that you are executing in a context that uses your desired namespace.
1. Create the Pod and its Service:
kubectl create -f digits.yaml
2. Examine the created objects:
kubectl get deployment gpu-pod -o yaml
kubectl get service gpu-service -o yaml
3. Check pod status repeatedly until the pod with the "gpu-pod" name prefix shows as Running (this can take a while):
kubectl get pods
4. Test the service:
Find the http-digits service in the list of service endpoints in the UI, and click on it.
You should see the Digits UI; more information on running ML jobs with Digits is at https://github.com/NVIDIA/DIGITS/blob/master/docs/GettingStarted.md
5. Download data for the app:
PODNAME=$(kubectl get -o jsonpath='{.items[0].metadata.name}' pod -l run=digits-gpu-service)
kubectl exec -it $PODNAME bash
export http_proxy=http://web-proxy.corp.hpecorp.net:8080
python -m digits.download_data mnist ~/mnist
exit
Note the directory where the data is downloaded; this will be printed near the end of the download output e.g. `Dataset directory is created successfully at '/root/mnist'`.
6. Exercise the app:
Continue through the Digits UI starting with the “Using the webapp” section (https://github.com/NVIDIA/DIGITS/blob/master/docs/GettingStarted.md#using-the-webapp). When creating a "New Image Classification Dataset", for training images use the folder containing your downloaded training data. For example if the data was downloaded to "/root/mnist" in the previous step, you would use the "/root/mnist/train" folder here.
Note that if you only have one GPU assigned to the pod, the model-training step will NOT offer you a choice of GPUs (contrary to the example screenshot). You should still see your GPU displayed in the Hardware panel while the model is training.
7. Cleaning up:
Once you have finished your checks and want to delete the objects, do:
kubectl delete service gpu-service
kubectl delete deployment gpu-pod
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment