The RHPDS catalog item you provisioned is a single-node OpenShift environment that is backed by an Amazon P-type EC2 instance which has 1 NVIDIA GPU. It is a 100% vanilla/standard OpenShift Container Platform 3.10 installation. Post-install, there were a few additional things done consuming Ansible content from the https://github.com/redhat-performance/openshift-psap repository:
-
The NVIDIA hardware driver for the GPU was installed on the host
The driver provides kernel support for using the GPU
-
The NVIDIA runtime hook was installed
The hook is a prestart hook, which means it is executed right before the app/container (ENTRYPOINT, COMMAND) starts. The hook is mount binding binaries, libraries, devices and configuration files into the running container. Making the needed files available to run a CUDA workload.
-
The NVIDIA device plugin daemonset was installed
The DevicePlugin is a daemonset that runs on all nodes in the environment and discovers the GPUs. It then makes the GPUs an allocatable resource that pods can consume.
After these initial steps were performed, a test workload (CUDA vector) was deployed. This pod ran once and terminated, but it remains in the nvidia-device-plugin OpenShift project so that you can look at its logs.
The email you received from RHPDS provided SSH access information. You can
sudo -i
to root
once you SSH in to the host. Then, you can simply
demonstrate that Kubrernetes/OpenShift recognizes your node has GPU capacity
with the following command:
oc describe $(oc get node -o name) | grep Capacity -A12
You should see that there is a GPU capacity of 1 and a single GPU allocatable
as well (nvidia.com/gpu
). At this time there is no support for fractional
consumption/utilization of GPU resources. They are represented as single,
whole GPUs and must be requested by the workload in integer increments.
Note
|
There is upstream work being done on slicing / "virtualizing" GPUs, but it is still in its early phases. The following Kubernetes issue tracks that design work and our involvement: kubernetes/kubernetes#52757 |
The email you received from RHPDS provided SSH access information. Your cluster web console is available at the same hostname, on port 8443:
https://bastion.GUID.openshiftworkshop.com:8443
Note
|
The environment does not currently use LetsEncrypt so you will need to accept the self-signed certificate. |
You can log in as gpu-user
with any password. The cluster is set up to use
AnyPassword
which means you can log in with any user, however only
gpu-user
has been given cluster-admin
rights. Make sure you use gpu-user
.
The aforementioned CUDA vector pod has been deployed in the
nvidia-device-plugin
project.
-
Navigate to that project in the web console.
-
Hover over Applications in the left navigation
-
Click Pods
You will find the terminated CUDA pod (cuda-vector-add
). Click into the
pod, then go to the Logs tab and show that the pod was successful (Test
PASSED
). This means that the program inside of the pod was able to access
the NVIDIA GPU. This is a very basic demonstration that, in fact, GPUs can be
consumed from workloads in OpenShift
Pods.
The rest of the demo you will do uses a [Caffe2](https://caffe2.ai) workload that runs in a [Jupyter](https://jupyter.org) notebook.
The YAML defines a Pod and Service for the application. You can use the Import YAML function in the UI:
https://raw.githubusercontent.com/thoraxe/openshift-psap/ocp-311-tweaks/playbooks/roles/gpu-pod/caffe2-1gpu.yaml
Once deployed, you will need to expose the app with a Route. You can
do this using the CLI or using the web console. The Service is simply called caffe2
.
Note
|
The Caffe2 image is not currently pre-pulled so this would be a good time to visit the Caffe2 site to talk about what it actually is. |
Jupyter notebooks have a security feature that requires the use of a token. Once the app is deployed and exposed, you can execute the following scriptlet on the console where you SSH’d to look at the GPU resources. It will give you the URL and the token to directly access the application:
ROUTE=$(oc get routes -n nvidia-device-plugin | grep caffe2 | awk '{print $2}') TOKEN=$(oc logs -n nvidia-device-plugin pod/caffe2|head -4|grep token=|awk -Ftoken= '{print $2}') echo http://$ROUTE/notebooks/caffe2/caffe2/python/tutorials/MNIST.ipynb?token=$TOKEN
Visit the URL that is output from the scriptlet.
Jupyter notebooks make it easy to share and describe code. All of the lines in a notebook are essentially executable code, or descriptions. In the Jupyter notebook URL you visited:
-
Click Kernel and then click Restart & Clear Output
-
then, click Kernel and then click Restart & Run All
This will cause Jupyter to start executing each line of code in the notebook sequentially.
The general workflow for the most ML frameworks (caffe, caffe2, mxnet, torch etc.) is the following, and that is what most examples (MNIST, CIFAR-10,…) for any framework will do, only the API and the dataset will change.
-
Get your dataset: The example is using the MNIST dataset which is a collection of 70,000 handwritten digits. The dataset consists of 60 000 training samples and 10 000 test samples.
-
Create the data format the framework understands Caffe2 uses the LMDB format. The notebook downloads the databases that were converted in another notebook (not demonstrated) to the framework native format
Notesee MNIST_Dataset_and_Databases.ipynb for more information
-
Create the model: The example uses the Lenet model, which is a Convolutional Neural Network (CNN). This is a pretty standard model used in things related to visual recognition.
-
Train the model: The configured model (input, layers , output) will be fed with the training data up to a specified accuracy (for benchmarks: time to accuracy TTA) or numbers of epochs/iterations. The example uses 200 iterations with a batch size of 64 which results in 12800 samples being trained against.
-
Test the model: After training the model, it is tested against the 10 000 test images and the test accuracy is reported.
-
Inference: The saved model can now be used to do inference on other hand written digits.
The "compute" (calculation) intensive part of the whole pipeline is, as you might have guessed, the training of the underlying model. The main operations performed are essentially matrix multiplication. (Forward & Backward Pass, Tensors) Those multiplications can be done massively parallel on a GPU (For every neuron in every layer there are a lot of computations to be done).