Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save MaverickMeerkat/99207ab15847d510fca13e99bf3a50bf to your computer and use it in GitHub Desktop.
Save MaverickMeerkat/99207ab15847d510fca13e99bf3a50bf to your computer and use it in GitHub Desktop.
Implement a NN - Part 5: Batch Norm.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true,
"authorship_tag": "ABX9TyNfHhzNpb033mKQ/GWigsoV",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/MaverickMeerkat/99207ab15847d510fca13e99bf3a50bf/implement-a-nn-part-5-batch-norm.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"In this notebook we will revisit the manual implementation of a NN, but add a Batch-Norm layer.\n",
"\n",
"Like always we will start by loading the necessary libraries. \n",
"\n",
"Note that we are only using `torchvision` for the MNIST dataset - loading and handling it. We are not using any of the `pytorch` capabilities for actual training. "
],
"metadata": {
"id": "-WbuIOHMkD9o"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xxTn68vggPaW"
},
"outputs": [],
"source": [
"import numpy as np # for doing all the math and matrix work\n",
"import matplotlib.pyplot as plt # for a bit of graphing\n",
"\n",
"from scipy.special import expit # sigmoid function\n",
"\n",
"from torchvision import datasets, transforms # for the MNIST dataset"
]
},
{
"cell_type": "markdown",
"source": [
"Set plotting DPI for bigger plots, and a random seed for reproducibility."
],
"metadata": {
"id": "uzGYsyc8yrMs"
}
},
{
"cell_type": "code",
"source": [
"plt.rcParams['figure.dpi'] = 120 # set plotting dpi\n",
"\n",
"# set seed for reproducibility\n",
"random_seed = 247\n",
"np.random.seed(random_seed)"
],
"metadata": {
"id": "HQpxM713yphQ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Data: MNIST"
],
"metadata": {
"id": "wj8G9MxSal17"
}
},
{
"cell_type": "markdown",
"source": [
"Like before, we will use the MNIST data, only now we have to transform it into a numpy tensor instead of a torch tensor. "
],
"metadata": {
"id": "KnvcrC7aStYo"
}
},
{
"cell_type": "code",
"source": [
"training_data = datasets.MNIST(\n",
" root=\"data\",\n",
" train=True,\n",
" download=True,\n",
" transform=transforms.ToTensor()\n",
")\n",
"\n",
"validation_data = datasets.MNIST(\n",
" root=\"data\",\n",
" train=False,\n",
" download=True,\n",
" transform=transforms.ToTensor()\n",
")"
],
"metadata": {
"id": "98dSnsU3Ss9_"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We'll convert the dataset to numpy arrays:"
],
"metadata": {
"id": "iKxl05lG5zAq"
}
},
{
"cell_type": "code",
"source": [
"x = training_data.data.numpy()\n",
"x = x.reshape(-1, 784)\n",
"y = training_data.targets.numpy()\n",
"\n",
"x_val = validation_data.data.numpy()\n",
"x_val = x_val.reshape(-1, 784)\n",
"y_val = validation_data.targets.numpy()"
],
"metadata": {
"id": "he4biHXk5-jC"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"**The task is <font color=\"red\">Classification</font>** - we want to classify the image to the corresponding digit. "
],
"metadata": {
"id": "NUijTRWPthQi"
}
},
{
"cell_type": "markdown",
"source": [
"# Modifying the NN\n",
"\n",
"Let's load the code we need for the NN from the previous implementation notebook. \n",
"\n",
"We have to modify:\n",
"* In the `update` function add the parameters for a BatchNorm layer\n",
"* Implement a `sigmoid` activation function: there are numerical dificulties that arise here, if we call `exp(x)` on a number that is too large we will get an overflow issue. We could check each number and if it's $>0$ use the $\\frac{1}{1+e^{-x}}$ implementation, otherwise use the $\\frac{e^x}{1+e^x}$ implementation. The problem is that we have to check this for every element in the observations matrix. Luckily, the `scipy` library has a very good implementation of sigmoid (called \"expit\") which we can use instead.\n"
],
"metadata": {
"id": "L0zUbU8hCIhM"
}
},
{
"cell_type": "markdown",
"source": [
"# BN on the $a$'s\n",
"\n",
"In out current implementation, it's easier to run BN on the outputs ($a$'s) and not inputs ($z$'s) of the activations. We will later change the implementation to also pass BN on the inputs. "
],
"metadata": {
"id": "uoISblP9CwPf"
}
},
{
"cell_type": "code",
"source": [
"class LinearLayer():\n",
" def __init__(self, input_size, output_size, activation_fn):\n",
" self.W = np.random.randn(input_size, output_size)*0.1\n",
" self.b = np.zeros(output_size)\n",
" self.activation_fn = activation_fn\n",
"\n",
" def forward(self, x, training):\n",
" self.input = x # a_{l-1}\n",
" self.output = x @ self.W + self.b # z_l\n",
" return self.activation_fn(self.output) # a_l\n",
"\n",
" def backward(self, grad):\n",
" grad = grad * self.activation_fn(self.output, derivative=True)\n",
" self.grad_W = self.input.T @ grad\n",
" self.grad_b = np.sum(grad, axis=0)\n",
" return grad @ self.W.T"
],
"metadata": {
"id": "O2BeeQkUCPFX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def relu(x, derivative=False):\n",
" if derivative:\n",
" return (x > 0).astype(float)\n",
" else:\n",
" return np.maximum(0, x)\n",
"\n",
"def identity(x, derivative=False):\n",
" if derivative:\n",
" return np.ones_like(x)\n",
" else:\n",
" return x\n",
"\n",
"def sigmoid(x, derivative=False):\n",
" if derivative:\n",
" return expit(x) * (1-expit(x)) \n",
" else:\n",
" return expit(x) "
],
"metadata": {
"id": "K982wh8OCR4P"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"class NeuralNetwork():\n",
" def __init__(self, layers):\n",
" self.layers = layers\n",
"\n",
" def forward(self, x, training):\n",
" for layer in self.layers:\n",
" x = layer.forward(x, training)\n",
" return x\n",
"\n",
" def backward(self, grad):\n",
" for layer in reversed(self.layers):\n",
" grad = layer.backward(grad)\n",
"\n",
" def update(self, learning_rate):\n",
" for layer in self.layers:\n",
" if isinstance(layer, LinearLayer):\n",
" layer.W -= learning_rate * layer.grad_W\n",
" layer.b -= learning_rate * layer.grad_b\n",
" if isinstance(layer, BatchNormLayer):\n",
" layer.gamma -= learning_rate * layer.grad_gamma\n",
" layer.beta -= learning_rate * layer.grad_beta\n",
"\n",
" def train(self, x, y, x_val, y_val, learning_rate, loss_fn, num_epochs, batch_size):\n",
" train_accs = []\n",
" val_accs = []\n",
" for epoch in range(num_epochs):\n",
" # training mode\n",
" indices = np.random.permutation(len(x))\n",
" correct = 0\n",
" for i in range(0, len(x), batch_size):\n",
" x_batch = x[indices[i:i+batch_size]]\n",
" y_batch = y[indices[i:i+batch_size]]\n",
" y_pred = self.forward(x_batch, training=True)\n",
" loss_i, grad = loss_fn(y_batch, y_pred)\n",
" self.backward(grad)\n",
" self.update(learning_rate)\n",
" pred = np.argmax(y_pred, axis=1)\n",
" correct += (y_batch == pred).sum()\n",
" accuracy = 100. * correct / len(x)\n",
" train_accs.append(accuracy)\n",
"\n",
" # validation mode\n",
" val_indices = np.random.permutation(len(x_val))\n",
" correct_val = 0\n",
" for i in range(0, len(x_val), batch_size):\n",
" x_batch_val = x_val[val_indices[i:i+batch_size]]\n",
" y_batch_val = y_val[val_indices[i:i+batch_size]]\n",
" y_pred_val = self.forward(x_batch_val, training=False)\n",
" pred_val = np.argmax(y_pred_val, axis=1)\n",
" correct_val += (y_batch_val == pred_val).sum()\n",
" val_accuracy = 100. * correct_val / len(x_val)\n",
" val_accs.append(val_accuracy)\n",
"\n",
" print(f'Epoch: {epoch}, Train Acc.: {accuracy}, Val Acc.: {val_accuracy}')\n",
" return train_accs, val_accs"
],
"metadata": {
"id": "aRkRgNoBRjLT"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def CrossEntropy(y, logits):\n",
" num_samples = y.shape[0]\n",
" shifted_logits = logits - np.max(logits, axis=1, keepdims=True)\n",
" Z = np.sum(np.exp(shifted_logits), axis=1, keepdims=True)\n",
" log_probs = shifted_logits - np.log(Z)\n",
" loss = -np.sum(log_probs[np.arange(num_samples), y]) / num_samples\n",
"\n",
" y_one_hot = np.zeros((len(y), 10))\n",
" y_one_hot[np.arange(len(y)), y] = 1\n",
" a = np.exp(log_probs)\n",
" delta = (a - y_one_hot) / num_samples\n",
"\n",
" return loss, delta"
],
"metadata": {
"id": "n8zd5jNa4GFF"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Adding the BatchNorm\n",
"\n",
"Now we will implement a BatchNorm layer."
],
"metadata": {
"id": "51PoSQZIwBpJ"
}
},
{
"cell_type": "markdown",
"source": [
"Remember the formulas shown in class:\n",
"\n",
"$$ \\frac{\\partial \\mathcal L}{\\partial \\gamma} = 1^T(\\delta \\cdot z_{l,n})=\\sum_i \\delta_i \\cdot z_{i,l,n} \\\\\n",
"\\frac{\\partial \\mathcal L}{\\partial \\beta} = 1^T \\delta \\\\\n",
"\\frac{\\partial \\mathcal L}{\\partial z_{l,n}} = \\delta \\cdot \\gamma \\\\\n",
"\\frac{\\partial \\mathcal L}{\\partial z} = \\frac{1}{n \\sqrt {\\sigma^2+\\epsilon}}[n\\frac{\\partial \\mathcal L}{\\partial z_{l,n}} - 1^T\\frac{\\partial \\mathcal L}{\\partial z_{l,n}}-z_{l,n}(1^T\\frac{\\partial \\mathcal L}{\\partial z_{l,n}}z_{l,n})]\\\\\n",
"$$"
],
"metadata": {
"id": "TdhdHCa2DhW3"
}
},
{
"cell_type": "code",
"source": [
"class BatchNormLayer():\n",
" def __init__(self, input_size, momentum=0.9, epsilon=1e-5):\n",
" self.gamma = np.ones(input_size) \n",
" self.beta = np.zeros(input_size)\n",
" self.momentum = momentum\n",
" self.epsilon = epsilon\n",
" self.running_mu = np.zeros(input_size)\n",
" self.running_sig2 = np.ones(input_size)\n",
"\n",
" def forward(self, x, training=True):\n",
" if training:\n",
" # Compute mean and variance\n",
" self.mu = np.mean(x, axis=0)\n",
" self.sig2 = np.var(x, axis=0)\n",
"\n",
" # Normalize\n",
" self.x_norm = (x - self.mu) / np.sqrt(self.sig2 + self.epsilon)\n",
"\n",
" # Scale and shift\n",
" self.out = self.gamma * self.x_norm + self.beta\n",
"\n",
" # Update running mean and variance\n",
" self.running_mu = self.momentum * self.running_mu + (1 - self.momentum) * self.mu\n",
" self.running_sig2 = self.momentum * self.running_sig2 + (1 - self.momentum) * self.sig2\n",
" else:\n",
" # Normalize using the running mean and variance\n",
" self.x_norm = (x - self.running_mu) / np.sqrt(self.running_sig2 + self.epsilon)\n",
"\n",
" # Scale and shift\n",
" self.out = self.gamma * self.x_norm + self.beta\n",
" return self.out\n",
"\n",
" def backward(self, grad):\n",
" # Compute gradients of gamma and beta\n",
" self.grad_gamma = np.sum(grad * self.x_norm, axis=0)\n",
" self.grad_beta = np.sum(grad, axis=0)\n",
"\n",
" # Compute gradient of input x\n",
" n = grad.shape[0]\n",
" grad_x_norm = grad * self.gamma\n",
" dx = (1 / (n*np.sqrt(self.sig2 + self.epsilon)))*(n * grad_x_norm - np.sum(grad_x_norm, axis=0) - self.x_norm * np.sum(grad_x_norm*self.x_norm, axis=0))\n",
" return dx"
],
"metadata": {
"id": "2NJDcAFYwKYK"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Training\n",
"\n",
"We will train with a 3 hidden layers of size 50, and sigmoid activation functions. We are going to use different learning rates. Specifically, I will set a small learning rate for the BN network becaue we are applying BatchNorm on the the outputs of the activations, so it seems to be more sensitive (maybe because the equations where we showed the scale invariance of BN $BN(aW) = BN(a(cW)), etc.$ no longer apply). \n",
"\n",
"[Note that I searched for the best learning rate for the simple network and found it to be 0.1; Without searching as much I found a much better LR for the BN network].\n",
"\n",
"The simple network:"
],
"metadata": {
"id": "yd7teSi2EtsV"
}
},
{
"cell_type": "code",
"source": [
"np.random.seed(random_seed)"
],
"metadata": {
"id": "3YMBVymtB88Z"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"linear1 = LinearLayer(784, 50, sigmoid)\n",
"linear2 = LinearLayer(50, 50, sigmoid)\n",
"linear3 = LinearLayer(50, 50, sigmoid)\n",
"linear4 = LinearLayer(50, 10, identity)\n",
"simple_nn = NeuralNetwork([linear1, linear2, linear3, linear4])"
],
"metadata": {
"id": "JNLgb1qUB88a"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"train_accs_s, val_accs_s = simple_nn.train(x, y, x_val, y_val, learning_rate=0.1, loss_fn=CrossEntropy, num_epochs=10, batch_size=128)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c338576b-7729-44b1-e9d0-27bf2d8f8fae",
"id": "CMwWurvRB88a"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch: 0, Train Acc.: 14.401666666666667, Val Acc.: 24.25\n",
"Epoch: 1, Train Acc.: 32.98, Val Acc.: 51.8\n",
"Epoch: 2, Train Acc.: 58.81333333333333, Val Acc.: 68.24\n",
"Epoch: 3, Train Acc.: 71.72666666666667, Val Acc.: 75.7\n",
"Epoch: 4, Train Acc.: 76.28333333333333, Val Acc.: 80.53\n",
"Epoch: 5, Train Acc.: 79.515, Val Acc.: 80.3\n",
"Epoch: 6, Train Acc.: 80.21666666666667, Val Acc.: 81.32\n",
"Epoch: 7, Train Acc.: 82.18333333333334, Val Acc.: 82.41\n",
"Epoch: 8, Train Acc.: 81.825, Val Acc.: 81.51\n",
"Epoch: 9, Train Acc.: 81.65333333333334, Val Acc.: 82.5\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Now for the BN network:"
],
"metadata": {
"id": "9tL5-BhMFUbV"
}
},
{
"cell_type": "code",
"source": [
"np.random.seed(random_seed)"
],
"metadata": {
"id": "jkV0JfY_-RSj"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"linear1 = LinearLayer(784, 50, sigmoid)\n",
"bn1 = BatchNormLayer(50)\n",
"linear2 = LinearLayer(50, 50, sigmoid)\n",
"bn2 = BatchNormLayer(50)\n",
"linear3 = LinearLayer(50, 50, sigmoid)\n",
"bn3 = BatchNormLayer(50)\n",
"linear4 = LinearLayer(50, 10, identity)\n",
"bn_nn = NeuralNetwork([linear1, bn1, linear2, bn2, linear3, bn3, linear4])"
],
"metadata": {
"id": "Zq3tjB3b1Xan"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"train_accs_bn, val_accs_bn = bn_nn.train(x, y, x_val, y_val, learning_rate=0.01, loss_fn=CrossEntropy, num_epochs=10, batch_size=128)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0m-yVaDk1Xfr",
"outputId": "3a905bef-a07c-4a1f-d44e-edd0b372044c"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch: 0, Train Acc.: 66.71833333333333, Val Acc.: 80.54\n",
"Epoch: 1, Train Acc.: 81.69833333333334, Val Acc.: 84.8\n",
"Epoch: 2, Train Acc.: 85.115, Val Acc.: 86.94\n",
"Epoch: 3, Train Acc.: 86.86666666666666, Val Acc.: 88.36\n",
"Epoch: 4, Train Acc.: 87.84833333333333, Val Acc.: 88.38\n",
"Epoch: 5, Train Acc.: 88.31, Val Acc.: 89.49\n",
"Epoch: 6, Train Acc.: 89.07, Val Acc.: 90.16\n",
"Epoch: 7, Train Acc.: 89.76166666666667, Val Acc.: 90.72\n",
"Epoch: 8, Train Acc.: 90.20166666666667, Val Acc.: 90.95\n",
"Epoch: 9, Train Acc.: 90.30666666666667, Val Acc.: 91.28\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"plt.plot(val_accs_s, label=\"Simple Network\")\n",
"plt.plot(val_accs_bn, label=\"Batch Norm Network (on a's)\")\n",
"plt.title(\"Simple vs. BN Validation Accuracy\")\n",
"plt.xlabel('Epochs')\n",
"plt.ylabel('Val. Acc.')\n",
"plt.legend()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 499
},
"id": "1V9IYqu_WnX0",
"outputId": "2e878750-3075-45b6-dbf0-aafcbecdbbdf"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x7f69bedee4f0>"
]
},
"metadata": {},
"execution_count": 16
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 720x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"source": [
"# BN on the $z$'s\n",
"\n",
"For running BN on the $z$'s we need to make some more changes to our implementation:\n",
"\n",
"* Since we want to run the BatchNorm on the $z$'s (before the activation) we have to separate between the linear layer and the activation functions. \n",
"* So the linear layer will no longer recieve an activation function; they will compute a simple linear transformation in the forward pass, and give only the gradient w.r.t. the linear operation.\n",
"* We will create classes of activations instead of the functions from before, which will keep rolling the gradient for the backprop algorithm."
],
"metadata": {
"id": "f_HN6ib8CqVn"
}
},
{
"cell_type": "code",
"source": [
"class LinearLayer():\n",
" def __init__(self, input_size, output_size): \n",
" self.W = np.random.randn(input_size, output_size)*0.1\n",
" self.b = np.zeros(output_size)\n",
"\n",
" def forward(self, x, training):\n",
" self.input = x # a_{l-1}\n",
" self.output = x @ self.W + self.b # z_l\n",
" return self.output \n",
"\n",
" def backward(self, grad):\n",
" self.grad_W = self.input.T @ grad\n",
" self.grad_b = np.sum(grad, axis=0)\n",
" return grad @ self.W.T"
],
"metadata": {
"id": "nmABG_f8DV7u"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"class ReLU():\n",
" def __init__(self):\n",
" pass\n",
"\n",
" def forward(self, x, training):\n",
" return np.maximum(0, x)\n",
"\n",
" def backward(self, grad):\n",
" drelu = (x > 0).astype(float)\n",
" return grad * drelu\n",
"\n",
"class Identity():\n",
" def __init__(self):\n",
" pass\n",
"\n",
" def forward(self, x, training):\n",
" self.x = x\n",
" return x\n",
"\n",
" def backward(self, grad):\n",
" di = np.ones_like(self.x)\n",
" return grad * di\n",
"\n",
"class Sigmoid():\n",
" def __init__(self):\n",
" pass\n",
"\n",
" def forward(self, x, training):\n",
" self.x = x\n",
" return expit(x)\n",
"\n",
" def backward(self, grad):\n",
" dsig = expit(self.x) * (1-expit(self.x)) \n",
" return grad * dsig"
],
"metadata": {
"id": "DuOw8YWsDV7u"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now let's run this again. We will use bigger learning rate now for the BN network. \n",
"\n",
"The simple network:"
],
"metadata": {
"id": "9AV2qXtvDV7x"
}
},
{
"cell_type": "code",
"source": [
"np.random.seed(random_seed)"
],
"metadata": {
"id": "Bs1H_mzJDV7x"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"linear1 = LinearLayer(784, 50)\n",
"sig1 = Sigmoid()\n",
"linear2 = LinearLayer(50, 50)\n",
"sig2 = Sigmoid()\n",
"linear3 = LinearLayer(50, 50)\n",
"sig3 = Sigmoid()\n",
"linear4 = LinearLayer(50, 10)\n",
"simple_nn = NeuralNetwork([linear1, sig1, linear2, sig2, linear3, sig3, linear4])"
],
"metadata": {
"id": "Y1mhtrKmDV7x"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"train_accs_s, val_accs_s = simple_nn.train(x, y, x_val, y_val, learning_rate=0.1, loss_fn=CrossEntropy, num_epochs=10, batch_size=128)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "7a987d56-6bd7-4052-dca5-9758b365b297",
"id": "7m5h0HfNDV7x"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch: 0, Train Acc.: 14.401666666666667, Val Acc.: 24.25\n",
"Epoch: 1, Train Acc.: 32.98, Val Acc.: 51.8\n",
"Epoch: 2, Train Acc.: 58.81333333333333, Val Acc.: 68.24\n",
"Epoch: 3, Train Acc.: 71.72666666666667, Val Acc.: 75.7\n",
"Epoch: 4, Train Acc.: 76.28333333333333, Val Acc.: 80.53\n",
"Epoch: 5, Train Acc.: 79.515, Val Acc.: 80.3\n",
"Epoch: 6, Train Acc.: 80.21666666666667, Val Acc.: 81.32\n",
"Epoch: 7, Train Acc.: 82.18333333333334, Val Acc.: 82.41\n",
"Epoch: 8, Train Acc.: 81.825, Val Acc.: 81.51\n",
"Epoch: 9, Train Acc.: 81.65333333333334, Val Acc.: 82.5\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"We see that for the simple network we got the exact same results as before. Now let's train the batch norm network:"
],
"metadata": {
"id": "ERRijUHCJBde"
}
},
{
"cell_type": "code",
"source": [
"np.random.seed(random_seed)"
],
"metadata": {
"id": "J4B2JM_mDV7x"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"linear1 = LinearLayer(784, 50)\n",
"bn1 = BatchNormLayer(50)\n",
"sig1 = Sigmoid()\n",
"linear2 = LinearLayer(50, 50)\n",
"bn2 = BatchNormLayer(50)\n",
"sig2 = Sigmoid()\n",
"linear3 = LinearLayer(50, 50)\n",
"bn3 = BatchNormLayer(50)\n",
"sig3 = Sigmoid()\n",
"linear4 = LinearLayer(50, 10)\n",
"bn_nn = NeuralNetwork([linear1, bn1, sig1, linear2, bn2, sig2, linear3, bn3, sig3, linear4])"
],
"metadata": {
"id": "srujNAPkDV7y"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"train_accs_bn, val_accs_bn = bn_nn.train(x, y, x_val, y_val, learning_rate=2, loss_fn=CrossEntropy, num_epochs=10, batch_size=128)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "b5705593-80f4-4d0f-b7eb-a239e6d9a19e",
"id": "FlKUfJK8DV7z"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch: 0, Train Acc.: 87.54666666666667, Val Acc.: 91.03\n",
"Epoch: 1, Train Acc.: 94.13666666666667, Val Acc.: 94.29\n",
"Epoch: 2, Train Acc.: 95.645, Val Acc.: 96.09\n",
"Epoch: 3, Train Acc.: 96.68, Val Acc.: 96.49\n",
"Epoch: 4, Train Acc.: 97.17166666666667, Val Acc.: 96.54\n",
"Epoch: 5, Train Acc.: 97.42166666666667, Val Acc.: 97.13\n",
"Epoch: 6, Train Acc.: 97.82166666666667, Val Acc.: 97.01\n",
"Epoch: 7, Train Acc.: 98.09666666666666, Val Acc.: 97.13\n",
"Epoch: 8, Train Acc.: 98.33333333333333, Val Acc.: 97.53\n",
"Epoch: 9, Train Acc.: 98.42833333333333, Val Acc.: 97.66\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"plt.plot(val_accs_s, label=\"Simple Network\")\n",
"plt.plot(val_accs_bn, label=\"Batch Norm Network (on z's)\")\n",
"plt.title(\"Simple vs. BN Validation Accuracy\")\n",
"plt.xlabel('Epochs')\n",
"plt.ylabel('Val. Acc.')\n",
"plt.legend()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 499
},
"id": "rlvxFVtWWTjv",
"outputId": "17df31a6-2acd-4acf-a843-05f353750479"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x7f69bec76970>"
]
},
"metadata": {},
"execution_count": 25
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 720x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"source": [
"The results pretty much speak for themselves 😉."
],
"metadata": {
"id": "Kjq23KGFJNcR"
}
},
{
"cell_type": "markdown",
"source": [
"© David Refaeli 2023."
],
"metadata": {
"id": "7qdQQjjy1XpH"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment