Skip to content

Instantly share code, notes, and snippets.

@jbencina
Created June 1, 2017 03:43
Show Gist options
  • Save jbencina/073f354f5be88f33429f233cbe9d6cc5 to your computer and use it in GitHub Desktop.
Save jbencina/073f354f5be88f33429f233cbe9d6cc5 to your computer and use it in GitHub Desktop.
Using CNN TensorFlow to classify Simpsons images http://www.jbencina.com/blog/2017/05/31/getting-started-tensorflow/
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"import numpy as np\n",
"import math, os, random\n",
"\n",
"from skimage import io, exposure\n",
"from skimage.transform import rotate\n",
"from skimage.filters import scharr\n",
"\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.utils import shuffle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Collect image data\n",
"Directory of Simpsons images contains 2,488 color images of Homer, Lisa, Bart, Marge. Files are named like bart_213.jpg. The following scripts \n",
"1. Load the image\n",
"2. Convert to grayscale\n",
"3. Highlight edges & adjust contrast\n",
"4. Make a 2nd copy with the image upside down\n",
"5. Output feature and label arrays & create a 30% train-test split"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"y_labels = {\n",
" 'homer':0,\n",
" 'bart':1,\n",
" 'lisa':2,\n",
" 'marge':3\n",
"}\n",
"def get_filepaths(path):\n",
" data = []\n",
" \n",
" # Move through the directory and get the image paths. The Y lables are encoded in the file name\n",
" # like homer_203.jpg\n",
" for r,d,f in os.walk(path):\n",
" for file in f:\n",
" if '.jpg' in file:\n",
" full_name = os.path.join(r,file)\n",
" label = file.split('_')[1].split('.')[0]\n",
" data.append([full_name, y_labels.get(label)]) \n",
" return data\n",
" \n",
"def process_images(files, x_size, y_size):\n",
" # Flip each image upside down so we double the array to accomidate\n",
" arr = np.zeros(shape=(len(files)*2, x_size , y_size))\n",
" n = len(files)\n",
" for i in range(n):\n",
" # Load image and use scharr to get edges\n",
" img = io.imread(files[i][0],as_grey=True)\n",
" img = scharr(img)\n",
" \n",
" # Scale the intensity to whiten the edges\n",
" p95 = np.percentile(img, (95))\n",
" img = exposure.rescale_intensity(img, in_range=(0, p95))\n",
" \n",
" # Add image to the collection. Rotate it 180 and add another copy.\n",
" arr[i] = img\n",
" img = rotate(img, angle=180)\n",
" arr[i+n] = img\n",
" \n",
" return arr\n",
"\n",
"def get_image_data():\n",
" # Load all images in the directory and generate feature & label data\n",
" base_path = 'C:/Users/C13119/Documents/tensorflow/faces-clean-color/'\n",
" files = get_filepaths(base_path)\n",
" \n",
" x_data = process_images(files, 120, 160)\n",
" y_data = np.array(list([y for (x,y) in files]))\n",
" y_data = np.concatenate([y_data, y_data]) # Double it since we are flipping images\n",
" \n",
" return shuffle(x_data, y_data)\n",
"\n",
"X, y = get_image_data()\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize random image\n",
"Load a random image from the array and display it with its label. Images may intentionally be upside down"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"r = random.randint(0, len(X))\n",
"plt.imshow(X[r], cmap='gray')\n",
"plt.title('Random Image -- Label: {Y}'.format(Y=y[r]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build & run CNN\n",
"The following is adopted from the code provided by Google's Martin Gorner that can be found here under mnist_TF_layers.py https://github.com/martin-gorner/tensorflow-mnist-tutorial. The code accompanies a very nice lecture from the following YouTube video https://www.youtube.com/watch?v=vq2nnJ4g6N0. Some elements were also taken from the Google TensorFlow Layers tutorial https://www.tensorflow.org/tutorials/layers.\n",
"\n",
"This network implements three convolutional layers using strides to downsample the data. The final two layers include two densely connected layers with dropout. All layers use ReLU and and Batch Normalization. There is also a decaying learning rate implementation. \n",
"\n",
"While Martin Gorner's code uses a custom visualization module, I updated it to take advantage of TensorBoard. All you need to do is fire it up with the --logdir flag set to the same directory as the checkpoints.\n",
"\n",
"*Note* this was more of a learning exercise for myself. The dataset is small, and I ran this off a struggling laptop. The settings below get you around 95% accuracy after 500 iterations (30 minutes for me!) and ~80% on the test data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tf.reset_default_graph() # Because this is in Jupyter, otherwise a new graph is created every time you rerun\n",
"\n",
"tf.logging.set_verbosity(tf.logging.INFO)\n",
"\n",
"# ********************************************************************\n",
"# Define some functions first for use in the model (keeps things neat)\n",
"# ********************************************************************\n",
"\n",
"# Function feeds data into our model\n",
"def train_data_input_fn():\n",
" return tf.train.shuffle_batch([tf.constant(X_train, dtype=tf.float32), tf.constant(y_train, dtype=tf.int32)],\n",
" batch_size=100, capacity=1100, min_after_dequeue=1000, enqueue_many=True)\n",
"\n",
"\n",
"# Use cross entropy to define our loss function which we are trying to minimize\n",
"def conv_model_loss(Ylogits, Y_, mode):\n",
" if mode == tf.estimator.ModeKeys.TRAIN or mode == tf.estimator.ModeKeys.EVAL:\n",
" return tf.reduce_mean(tf.losses.softmax_cross_entropy(tf.one_hot(Y_,4), Ylogits))\n",
" else:\n",
" return None\n",
"\n",
"\n",
"# Define which optimizer we are using to minimize our loss. This one incorporates learning decay\n",
"def conv_model_train_op(loss, mode):\n",
" if mode == tf.estimator.ModeKeys.TRAIN:\n",
" return tf.contrib.layers.optimize_loss(\n",
" loss, \n",
" tf.train.get_global_step(), \n",
" learning_rate=0.003, \n",
" optimizer=\"Adam\",\n",
" learning_rate_decay_fn=lambda lr, step: 0.0001 + tf.train.exponential_decay(lr, step, -3000, math.e)\n",
" )\n",
" else:\n",
" return None\n",
"\n",
"\n",
"# Track accuracy, precision, recall metrics during our training and eval steps\n",
"def conv_model_eval_metrics(classes, Y_, mode):\n",
" if mode == tf.estimator.ModeKeys.TRAIN or mode == tf.estimator.ModeKeys.EVAL:\n",
" return {\n",
" 'accuracy': tf.metrics.accuracy(classes, Y_),\n",
" 'precision': tf.metrics.precision(classes, Y_),\n",
" 'recall': tf.metrics.recall(classes, Y_),\n",
" }\n",
" else:\n",
" return None\n",
"\n",
" # Define our actual model\n",
"def cnn_model_fn(features, labels, mode):\n",
" \n",
" \n",
" input_layer = tf.reshape(features, [-1, 120, 160, 1])\n",
" tf.summary.image('input', input_layer, 10) #Logs 10 sample images to view in TensorBoard\n",
" \n",
" # Convolutional Layer #1 - Out 60x80 image because stride 2\n",
" with tf.name_scope('cnn_layer1_8x8x8-s2'):\n",
" conv1 = tf.layers.conv2d(\n",
" strides=(2, 2),\n",
" inputs=input_layer,\n",
" filters=8,\n",
" kernel_size=[8, 8],\n",
" padding=\"same\",\n",
" use_bias=False,\n",
" activation=None)\n",
"\n",
" bn1 = tf.layers.batch_normalization(conv1, training=mode == tf.estimator.ModeKeys.TRAIN)\n",
" re1 = tf.nn.relu(bn1)\n",
" tf.summary.histogram('weights', conv1) #These add histograms to view in TensorBoard\n",
" tf.summary.histogram('bias', bn1)\n",
" tf.summary.histogram('activations', re1)\n",
" \n",
" # Convolutional Layer #2 - Out 30 x 40 image because stride 2\n",
" with tf.name_scope('cnn_layer2_6x6x16-s2'):\n",
" conv2 = tf.layers.conv2d(\n",
" strides=(2, 2),\n",
" inputs=re1,\n",
" filters=16,\n",
" kernel_size=[6, 6],\n",
" padding=\"same\",\n",
" use_bias=False,\n",
" activation=None)\n",
"\n",
" bn2 = tf.layers.batch_normalization(conv2, training=mode == tf.estimator.ModeKeys.TRAIN)\n",
" re2 = tf.nn.relu(bn2)\n",
" tf.summary.histogram('weights', conv2)\n",
" tf.summary.histogram('bias', bn2)\n",
" tf.summary.histogram('activations', re2)\n",
"\n",
" # Convolutional Layer #3 - Out 15 x 20 image because stride 2\n",
" with tf.name_scope('cnn_layer3_3x3x48-s2'):\n",
" conv3 = tf.layers.conv2d(\n",
" strides=(2, 2),\n",
" inputs=re2,\n",
" filters=36,\n",
" kernel_size=[3, 3],\n",
" padding=\"same\",\n",
" use_bias=False,\n",
" activation=None) \n",
"\n",
" bn3 = tf.layers.batch_normalization(conv3, training=mode == tf.estimator.ModeKeys.TRAIN)\n",
" re3 = tf.nn.relu(bn3)\n",
" tf.summary.histogram('weights', conv3)\n",
" tf.summary.histogram('bias', bn3)\n",
" tf.summary.histogram('activations', re3)\n",
" \n",
" # Flatten so we can use the output\n",
" re3_flat = tf.reshape(re3, [-1, 15 * 20 * 36])\n",
" \n",
" # Dense layer - 2048 hidden nodes with a dropout rate of 30%\n",
" with tf.name_scope('dense_layer4_10800x2048'):\n",
" dense = tf.layers.dense(inputs=re3_flat, units=2048, activation=None, use_bias=False)\n",
" bn_dense = tf.layers.batch_normalization(dense, training=mode == tf.estimator.ModeKeys.TRAIN)\n",
" re_dense = tf.nn.relu(bn_dense)\n",
"\n",
"\n",
" tf.summary.histogram('weights', dense)\n",
" tf.summary.histogram('bias', bn_dense)\n",
" tf.summary.histogram('activations', re_dense)\n",
" \n",
" dropout4 = tf.layers.dropout(\n",
" inputs=re_dense, rate=0.30, training=mode == tf.estimator.ModeKeys.TRAIN)\n",
" \n",
" # Dense layer - 1024 hidden nodes with a dropout rate of 30%\n",
" with tf.name_scope('dense_layer5_2048x1024'):\n",
" dense5 = tf.layers.dense(inputs=dropout4, units=1024, activation=None, use_bias=False)\n",
" bn_dense5 = tf.layers.batch_normalization(dense5, training=mode == tf.estimator.ModeKeys.TRAIN)\n",
" re_dense5 = tf.nn.relu(bn_dense5)\n",
"\n",
"\n",
" tf.summary.histogram('weights', dense5)\n",
" tf.summary.histogram('bias', bn_dense5)\n",
" tf.summary.histogram('activations', re_dense5)\n",
" \n",
" dropout5 = tf.layers.dropout(\n",
" inputs=re_dense5, rate=0.30, training=mode == tf.estimator.ModeKeys.TRAIN)\n",
" \n",
" # Final layer of 4 nodes which we will apply softmax to for prediction\n",
" with tf.name_scope('output_layer5'):\n",
" logits = tf.layers.dense(inputs=dropout5, units=4)\n",
" tf.summary.histogram('weights', logits)\n",
" \n",
" predict = tf.nn.softmax(logits)\n",
" classes = tf.cast(tf.argmax(predict, 1), tf.uint8)\n",
" \n",
" # Populate these using our helpful functions above\n",
" loss = conv_model_loss(logits, labels, mode)\n",
" train_op = conv_model_train_op(loss, mode)\n",
" eval_metrics = conv_model_eval_metrics(classes, labels, mode)\n",
" \n",
" with tf.variable_scope(\"performance_metrics\"):\n",
" tf.summary.scalar('accuracy', eval_metrics['accuracy'][1])\n",
" tf.summary.scalar('precision', eval_metrics['precision'][1])\n",
" tf.summary.scalar('recall', eval_metrics['recall'][1])\n",
"\n",
" # This is a required return format\n",
" return tf.estimator.EstimatorSpec(\n",
" mode=mode,\n",
" predictions={\"predictions\": predict, \"classes\": classes}, # name these fields as you like\n",
" loss=loss,\n",
" train_op=train_op,\n",
" eval_metric_ops=eval_metrics)\n",
"\n",
"# Build our clasifier\n",
"simp_classifier = tf.estimator.Estimator(\n",
" model_fn=cnn_model_fn, model_dir=\"/tmp/simpsons_conv_model\")\n",
"\n",
"# This is the actual training step. You can interrupt and resume it as needed since it's checkpointed\n",
"simp_classifier.train(input_fn=train_data_input_fn, steps=5000)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# This will evalulate our model against a data set it has never seen\n",
"def eval_data_input_fn():\n",
" return tf.constant(X_test, dtype=tf.float32), tf.constant(y_test, dtype=tf.int32)\n",
"\n",
"simp_classifier.evaluate(input_fn=eval_data_input_fn, steps=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment