Skip to content

Instantly share code, notes, and snippets.

@carlthome
Created June 25, 2018 11:22
Show Gist options
  • Save carlthome/51d62cbf5fc23098418eef93b11a5d78 to your computer and use it in GitHub Desktop.
Save carlthome/51d62cbf5fc23098418eef93b11a5d78 to your computer and use it in GitHub Desktop.
Comparing data formats in Keras ("channels_first" vs "channels_last") for VGG style ConvNets.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# NHWC vs. NCHW\n",
">\"On GPU, NCHW is faster. But on CPU, NHWC is sometimes faster.\" [(source)](https://www.tensorflow.org/performance/performance_models#build_the_model_with_both_nhwc_and_nchw)\n",
"\n",
"When computing convolutions, we can consider each tensor element as a struct with multiple features (e.g. one per image color) but we could also view each feature as part of an individual feature map. Depending on how we order the input tensor we'll get slightly different execution paths. Essentially it comes down to the order of computation and cache locality (in rough pseudo code):\n",
"\n",
"```python\n",
"# NHWC\n",
"for i in rows\n",
" for j in columns\n",
" for k in filters\n",
"\n",
"# NCHW\n",
"for k in filters\n",
" for i in rows\n",
" for j in columns\n",
"```\n",
"However, due to such a convoluted and deep toolchain (puns intended) it is really hard to know what the end result will be in terms of actual wall time. Benchmarking is the best way of getting a feel for how much we need to care about designing models that work with both data formats. The type of network that should benefit the most from the right ordering is something like [VGG16](https://arxiv.org/abs/1409.1556) that has a huge amount of convolution kernels compared to more modern architectures, so let's benchmark that one, because it will give us a decent upper bound on what performance differences we should expect."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2018-06-25T09:47:17.120450Z",
"start_time": "2018-06-25T09:47:15.433154Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'1.9.0-dev20180607'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import tensorflow as tf\n",
"\n",
"tf.VERSION"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2018-06-25T09:47:17.132610Z",
"start_time": "2018-06-25T09:47:17.124143Z"
}
},
"outputs": [],
"source": [
"from tensorflow.python.keras import *\n",
"from tensorflow.python.keras.models import *\n",
"from tensorflow.python.keras.layers import *\n",
"from tensorflow.python.keras.applications import *\n",
"\n",
"\n",
"def create_model():\n",
" backend.clear_session()\n",
"\n",
" base_model = VGG16(include_top=False)\n",
" inputs = base_model.inputs\n",
"\n",
" x = base_model.output\n",
" x = GlobalAveragePooling2D()(x)\n",
" x = Dense(1024, activation='relu')(x)\n",
" x = Dense(100, activation='softmax')(x)\n",
"\n",
" outputs = [x]\n",
" model = Model(inputs, outputs)\n",
" model.compile('adam', 'sparse_categorical_crossentropy')\n",
" return model"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2018-06-25T09:55:43.073936Z",
"start_time": "2018-06-25T09:47:17.135120Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/1\n",
"50000/50000 [==============================] - 50s 1ms/step - loss: 15.9570\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 878us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 878us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 877us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 879us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 886us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 883us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 883us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 879us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 881us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 44s 878us/step - loss: 15.9569\n",
"44 s ± 141 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)\n"
]
}
],
"source": [
"backend.set_image_data_format('channels_first')\n",
"x_train, y_train = datasets.cifar100.load_data()[0]\n",
"model = create_model()\n",
"nchw = %timeit -o -r10 model.fit(x_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2018-06-25T10:04:12.371955Z",
"start_time": "2018-06-25T09:55:43.078257Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/1\n",
"50000/50000 [==============================] - 49s 976us/step - loss: 15.9564\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 902us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 902us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 905us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 904us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 901us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 900us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 901us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 900us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 900us/step - loss: 15.9569\n",
"Epoch 1/1\n",
"50000/50000 [==============================] - 45s 896us/step - loss: 15.9569\n",
"45.1 s ± 118 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)\n"
]
}
],
"source": [
"backend.set_image_data_format('channels_last')\n",
"x_train, y_train = datasets.cifar100.load_data()[0]\n",
"model = create_model()\n",
"nhwc = %timeit -o -r10 model.fit(x_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"ExecuteTime": {
"end_time": "2018-06-25T10:55:12.438612Z",
"start_time": "2018-06-25T10:55:12.208193Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'Channels first is 2.39% faster.'"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import pandas as pd\n",
"\n",
"df = pd.DataFrame({'NCHW': nchw.all_runs, 'NHWC': nhwc.all_runs})\n",
"ax = df.boxplot()\n",
"ax.set_ylabel('Time [s]')\n",
"\n",
"means = df.mean()\n",
"ratio = means.NHWC / means.NCHW\n",
"f'Channels first is {100*(ratio - 1):.2f}% faster.'"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2018-06-25T11:10:22.002302Z",
"start_time": "2018-06-25T11:10:21.997857Z"
}
},
"source": [
"## Conclusion\n",
"As we can see above, TensorFlow's documentation isn't wrong when training a VGG-esque network (_lots_ of convolution filters) on a GPU. However the difference in performance is minor so I personally wouldn't bother with it. Hopefully XLA will be turned on by default eventually and its HLO could take care of this performance gap for us. Also, keep in mind that for control-flow heavy models such as RNNs the results would probably be even blurrier.\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment