GPU demos with OpenShift using Caffe2

Environment

The RHPDS catalog item you provisioned is a single-node OpenShift environment that is backed by an Amazon P-type EC2 instance which has 1 NVIDIA GPU. It is a 100% vanilla/standard OpenShift Container Platform 3.10 installation. Post-install, there were a few additional things done consuming Ansible content from the https://github.com/redhat-performance/openshift-psap repository:

The NVIDIA hardware driver for the GPU was installed on the host

The driver provides kernel support for using the GPU
The NVIDIA runtime hook was installed

The hook is a prestart hook, which means it is executed right before the app/container (ENTRYPOINT, COMMAND) starts. The hook is mount binding binaries, libraries, devices and configuration files into the running container. Making the needed files available to run a CUDA workload.
The NVIDIA device plugin daemonset was installed

The DevicePlugin is a daemonset that runs on all nodes in the environment and discovers the GPUs. It then makes the GPUs an allocatable resource that pods can consume.

After these initial steps were performed, a test workload (CUDA vector) was deployed. This pod ran once and terminated, but it remains in the nvidia-device-plugin OpenShift project so that you can look at its logs.

Demonstrate GPU Availability

The email you received from RHPDS provided SSH access information. You can sudo -i to root once you SSH in to the host. Then, you can simply demonstrate that Kubrernetes/OpenShift recognizes your node has GPU capacity with the following command:

oc describe $(oc get node -o name) | grep Capacity -A12

You should see that there is a GPU capacity of 1 and a single GPU allocatable as well (nvidia.com/gpu). At this time there is no support for fractional consumption/utilization of GPU resources. They are represented as single, whole GPUs and must be requested by the workload in integer increments.

Note	There is upstream work being done on slicing / "virtualizing" GPUs, but it is still in its early phases. The following Kubernetes issue tracks that design work and our involvement: kubernetes/kubernetes#52757

The email you received from RHPDS provided SSH access information. Your cluster web console is available at the same hostname, on port 8443:

https://bastion.GUID.openshiftworkshop.com:8443

Note	The environment does not currently use LetsEncrypt so you will need to accept the self-signed certificate.

You can log in as gpu-user with any password. The cluster is set up to use AnyPassword which means you can log in with any user, however only gpu-user has been given cluster-admin rights. Make sure you use gpu-user.

Demonstrate GPU Consumption - CUDA Vector

The aforementioned CUDA vector pod has been deployed in the nvidia-device-plugin project.

Navigate to that project in the web console.
Hover over Applications in the left navigation
Click Pods

You will find the terminated CUDA pod (cuda-vector-add). Click into the pod, then go to the Logs tab and show that the pod was successful (Test PASSED). This means that the program inside of the pod was able to access the NVIDIA GPU. This is a very basic demonstration that, in fact, GPUs can be consumed from workloads in OpenShift Pods.

Caffe2 Application

The rest of the demo you will do uses a [Caffe2](https://caffe2.ai) workload that runs in a [Jupyter](https://jupyter.org) notebook.

Deploy YAML

The YAML defines a Pod and Service for the application. You can use the Import YAML function in the UI:

https://raw.githubusercontent.com/thoraxe/openshift-psap/ocp-311-tweaks/playbooks/roles/gpu-pod/caffe2-1gpu.yaml

Expose Service

Once deployed, you will need to expose the app with a Route. You can do this using the CLI or using the web console. The Service is simply called caffe2.

Note	The Caffe2 image is not currently pre-pulled so this would be a good time to visit the Caffe2 site to talk about what it actually is.

Obtain the Token

Jupyter notebooks have a security feature that requires the use of a token. Once the app is deployed and exposed, you can execute the following scriptlet on the console where you SSH’d to look at the GPU resources. It will give you the URL and the token to directly access the application:

ROUTE=$(oc get routes -n nvidia-device-plugin | grep caffe2 | awk '{print $2}')
TOKEN=$(oc logs -n nvidia-device-plugin pod/caffe2|head -4|grep token=|awk -Ftoken= '{print $2}')
echo http://$ROUTE/notebooks/caffe2/caffe2/python/tutorials/MNIST.ipynb?token=$TOKEN

Visit the URL that is output from the scriptlet.

Execute the Notebook

Jupyter notebooks make it easy to share and describe code. All of the lines in a notebook are essentially executable code, or descriptions. In the Jupyter notebook URL you visited:

Click Kernel and then click Restart & Clear Output
then, click Kernel and then click Restart & Run All

This will cause Jupyter to start executing each line of code in the notebook sequentially.

About the Notebook

The general workflow for the most ML frameworks (caffe, caffe2, mxnet, torch etc.) is the following, and that is what most examples (MNIST, CIFAR-10,…) for any framework will do, only the API and the dataset will change.

Get your dataset: The example is using the MNIST dataset which is a collection of 70,000 handwritten digits. The dataset consists of 60 000 training samples and 10 000 test samples.
Create the data format the framework understands Caffe2 uses the LMDB format. The notebook downloads the databases that were converted in another notebook (not demonstrated) to the framework native format

Note

see MNIST_Dataset_and_Databases.ipynb for more information
Create the model: The example uses the Lenet model, which is a Convolutional Neural Network (CNN). This is a pretty standard model used in things related to visual recognition.
Train the model: The configured model (input, layers , output) will be fed with the training data up to a specified accuracy (for benchmarks: time to accuracy TTA) or numbers of epochs/iterations. The example uses 200 iterations with a batch size of 64 which results in 12800 samples being trained against.
Test the model: After training the model, it is tested against the 10 000 test images and the test accuracy is reported.
Inference: The saved model can now be used to do inference on other hand written digits.

The "compute" (calculation) intensive part of the whole pipeline is, as you might have guessed, the training of the underlying model. The main operations performed are essentially matrix multiplication. (Forward & Backward Pass, Tensors) Those multiplications can be done massively parallel on a GPU (For every neuron in every layer there are a lot of computations to be done).

noushi/instructions.adoc