Last active
January 23, 2020 20:28
-
-
Save joel-bluedata/62ddafc66afbe8bdc9591fde53d90fe5 to your computer and use it in GitHub Desktop.
K8s GPU test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apiVersion: apps/v1 | |
kind: Deployment | |
metadata: | |
name: gpu-pod | |
spec: | |
replicas: 1 | |
selector: | |
matchLabels: | |
run: digits-gpu-service | |
template: | |
metadata: | |
labels: | |
run: digits-gpu-service | |
spec: | |
containers: | |
- name: digits-container | |
image: nvidia/digits:6.0 | |
resources: | |
limits: | |
nvidia.com/gpu: 1 | |
--- | |
apiVersion: v1 | |
kind: Service | |
metadata: | |
name: gpu-service | |
labels: | |
hpecp.hpe.com/hpecp-internal-gateway: "true" | |
spec: | |
selector: | |
run: digits-gpu-service | |
ports: | |
- name: http-digits | |
protocol: TCP | |
port: 5000 | |
targetPort: 5000 | |
type: NodePort |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The kubectl commands below assume that you are executing in a context that uses your desired namespace. | |
1. Create the Pod and its Service: | |
kubectl create -f digits.yaml | |
2. Examine the created objects: | |
kubectl get deployment gpu-pod -o yaml | |
kubectl get service gpu-service -o yaml | |
3. Check pod status repeatedly until the pod with the "gpu-pod" name prefix shows as Running (this can take a while): | |
kubectl get pods | |
4. Test the service: | |
Find the http-digits service in the list of service endpoints in the UI, and click on it. | |
You should see the Digits UI; more information on running ML jobs with Digits is at https://github.com/NVIDIA/DIGITS/blob/master/docs/GettingStarted.md | |
5. Download data for the app: | |
PODNAME=$(kubectl get -o jsonpath='{.items[0].metadata.name}' pod -l run=digits-gpu-service) | |
kubectl exec -it $PODNAME bash | |
export http_proxy=http://web-proxy.corp.hpecorp.net:8080 | |
python -m digits.download_data mnist ~/mnist | |
exit | |
Note the directory where the data is downloaded; this will be printed near the end of the download output e.g. `Dataset directory is created successfully at '/root/mnist'`. | |
6. Exercise the app: | |
Continue through the Digits UI starting with the “Using the webapp” section (https://github.com/NVIDIA/DIGITS/blob/master/docs/GettingStarted.md#using-the-webapp). When creating a "New Image Classification Dataset", for training images use the folder containing your downloaded training data. For example if the data was downloaded to "/root/mnist" in the previous step, you would use the "/root/mnist/train" folder here. | |
Note that if you only have one GPU assigned to the pod, the model-training step will NOT offer you a choice of GPUs (contrary to the example screenshot). You should still see your GPU displayed in the Hardware panel while the model is training. | |
7. Cleaning up: | |
Once you have finished your checks and want to delete the objects, do: | |
kubectl delete service gpu-service | |
kubectl delete deployment gpu-pod |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment