Here we show how using generators allows to use large data sets that don't fit into computer memory for training. We focus on models with multiple input and output arrays, where the generator has to provide the data in form of dictionaries to the tf.dataset
.
This data pipeline is used in the Donkey Car training. The notebook is simply extracting the pipeline mechanics to give a simplified view on the functionality.
Last active
February 6, 2023 00:50
-
-
Save DocGarbanzo/71ae8e894c305d03a7f9995ac6a430dd to your computer and use it in GitHub Desktop.
A demo notebook for tensorflow datasets from generator
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "90e7deb1", | |
"metadata": {}, | |
"source": [ | |
"# Building Tensorflow data pipelines from generators\n", | |
"An notebook tutorial on how to work with `tf.data.Dataset.from_generator()`.\n", | |
"\n", | |
"## Motivation\n", | |
"This a short introduction to the creation of data pipelines in tensorflow when the training data is dynamically created through a generator in contrast to loading the data into memory before the training. This approach is required when the training data does not fit into the computer memory, and therefore has to be loaded successively during each epoch. Tensorflow supports certain operations that improve the performance when the usually file-based data loading is combined with the training loop, and we show how to use these in a toy example. \n", | |
"\n", | |
"In this notebook, we focus in particular on the case where the model has more than one input and one output. For this to work, the generator cannot simply yield an `(X, y)` tuple of training data, but needs to return a tuple of dictionaries instead. This setup requires a bit more data to be passed in, like the tensor shape of the data and their corresponding data types. We provide a toy example of a model with two input arrays and two output arrays for demonstration.\n", | |
"\n", | |
"We have built such a data pipeline in the Donkey Car project, because it gives the best flexibility in disentangeling the data format and storage from the model training. \n", | |
"\n", | |
"## Donkey Car\n", | |
"Donkey Car is a library for building self driving RC cars, for more information see the links below. The library uses the data pipeline described above. The use of that pipeline allows to:\n", | |
"* support training on smaller datasets where all data can be kept in memory and larger datasets where the data has to be loaded from disk while going through the epochs\n", | |
"* support different models that have different input and output tensors, all using the same pipeline\n", | |
"* efficient separation of training and validation data where different pre-processiong steps like image augmentations can be applied to the training set only\n", | |
"\n", | |
"This notebook is effectively abstracting the data pipeline from the Donkey Car training and the tensorflow / keras mechanics can be understood without all the Donkey specific features. \n", | |
"\n", | |
"## Getting started\n", | |
"We start with all the imports, we need tensorflow, numpy, tempfile, os and PIL." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "5f34e72b", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import os\n", | |
"import tempfile\n", | |
"import numpy as np\n", | |
"from PIL import Image\n", | |
"import tensorflow as tf\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "2ea7d0d0", | |
"metadata": {}, | |
"source": [ | |
"Let's check tensorflow's version. The notebook has been tested with tf 2.2 and tf 2.7. It is expected to work with all inbetween versions, too." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "39b60a53", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"TF version: 2.7.0\n" | |
] | |
} | |
], | |
"source": [ | |
"print(f'TF version: {tf.__version__}')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "e519178c", | |
"metadata": {}, | |
"source": [ | |
" In this example we assume our input data consists of rgb images of the size 240x240 and another floating vector of length 64: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "ab84e5a9", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"input_1_shape = (240, 240, 3)\n", | |
"input_2_shape = (64, )" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "6c65c406", | |
"metadata": {}, | |
"source": [ | |
"## Creating the model in keras / tensorflow\n", | |
"Assume the model has a vision component, given by a CNN that is coupled to two hidden dense layers and produces two scalar outputs. In addition to the image, the model also consumes a float vector as an input. This vector is simply concatenated with the last Flatten layer of the CNN.\n", | |
"\n", | |
"_**Note**: the naming of the input and output layers of the model is important._" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "c3ec65f2", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from tensorflow.keras.layers import Input, Dense, Convolution2D, Flatten\n", | |
"from tensorflow.keras.backend import concatenate\n", | |
"from tensorflow.keras.models import Model\n", | |
"\n", | |
"def create_model():\n", | |
" img_in = Input(shape=input_1_shape, name='img_in')\n", | |
" x = img_in\n", | |
" for i, filters in enumerate((24, 32, 64, 64, 64)):\n", | |
" x = Convolution2D(filters=filters, kernel_size=3, strides=2,\n", | |
" activation='relu', padding='same', name=f'conv{i}')(x)\n", | |
" x = Flatten(name='flatten')(x)\n", | |
" vec_in = Input(shape=input_2_shape, name='vec_in')\n", | |
" z = concatenate([x, vec_in])\n", | |
" z = Dense(100, activation='relu', name='dense_1')(z)\n", | |
" z = Dense(50, activation='relu', name='dense_2')(z)\n", | |
" out_1 = Dense(1, activation='sigmoid', name='out_1')(z)\n", | |
" out_2 = Dense(1, activation='sigmoid', name='out_2')(z)\n", | |
" model = Model(inputs=[img_in, vec_in], outputs=[out_1, out_2])\n", | |
" return model\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "43a2d45c", | |
"metadata": {}, | |
"source": [ | |
"We create the model and print the summary:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "8a1526f6", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Model: \"model\"\n", | |
"__________________________________________________________________________________________________\n", | |
" Layer (type) Output Shape Param # Connected to \n", | |
"==================================================================================================\n", | |
" img_in (InputLayer) [(None, 240, 240, 3 0 [] \n", | |
" )] \n", | |
" \n", | |
" conv0 (Conv2D) (None, 120, 120, 24 672 ['img_in[0][0]'] \n", | |
" ) \n", | |
" \n", | |
" conv1 (Conv2D) (None, 60, 60, 32) 6944 ['conv0[0][0]'] \n", | |
" \n", | |
" conv2 (Conv2D) (None, 30, 30, 64) 18496 ['conv1[0][0]'] \n", | |
" \n", | |
" conv3 (Conv2D) (None, 15, 15, 64) 36928 ['conv2[0][0]'] \n", | |
" \n", | |
" conv4 (Conv2D) (None, 8, 8, 64) 36928 ['conv3[0][0]'] \n", | |
" \n", | |
" flatten (Flatten) (None, 4096) 0 ['conv4[0][0]'] \n", | |
" \n", | |
" vec_in (InputLayer) [(None, 64)] 0 [] \n", | |
" \n", | |
" tf.concat (TFOpLambda) (None, 4160) 0 ['flatten[0][0]', \n", | |
" 'vec_in[0][0]'] \n", | |
" \n", | |
" dense_1 (Dense) (None, 100) 416100 ['tf.concat[0][0]'] \n", | |
" \n", | |
" dense_2 (Dense) (None, 50) 5050 ['dense_1[0][0]'] \n", | |
" \n", | |
" out_1 (Dense) (None, 1) 51 ['dense_2[0][0]'] \n", | |
" \n", | |
" out_2 (Dense) (None, 1) 51 ['dense_2[0][0]'] \n", | |
" \n", | |
"==================================================================================================\n", | |
"Total params: 521,220\n", | |
"Trainable params: 521,220\n", | |
"Non-trainable params: 0\n", | |
"__________________________________________________________________________________________________\n" | |
] | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"2021-12-15 22:35:10.590914: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", | |
"To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" | |
] | |
} | |
], | |
"source": [ | |
"model = create_model()\n", | |
"model.summary()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "e71f4dd5", | |
"metadata": {}, | |
"source": [ | |
"## Create training data on the filesystem\n", | |
"We now create mock training data on the file system. We assume there are `n+1` data items and all data is in one directory. We want to create the following directory structure:\n", | |
"```\n", | |
"|\n", | |
"|- img_0.jpeg\n", | |
"|- vec_0.npy\n", | |
"|- out_0.npy\n", | |
"...\n", | |
"|- img_n.jpeg\n", | |
"|- vec_n.npy\n", | |
"|- out_n.npy\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "967ca7b8", | |
"metadata": {}, | |
"source": [ | |
"Here, `img_xxx.jpeg` is the image data in jpeg format, `vec_xxx.npy` is the additional input vector, and `out_xxx.npy` is the output data that we want to fit the model to. Both this data is stored in native numpy format for simplicity.\n", | |
"\n", | |
"This function will create a temporary directory with data in the format of the required input and output dimension and save it into that directory." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "3ac94e9b", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def create_data_on_filesystem(num_samples=10):\n", | |
" data_dir = tempfile.mkdtemp()\n", | |
" for i in range(num_samples):\n", | |
" img_arr = np.random.randint(0, 255, input_1_shape, int).astype(np.uint8)\n", | |
" img = Image.fromarray(img_arr)\n", | |
" vec = np.random.uniform(0, 1, input_2_shape)\n", | |
" out = np.random.uniform(0, 1, (2,))\n", | |
" img.save(os.path.join(data_dir, f'img_{i}.jpeg'))\n", | |
" np.save(os.path.join(data_dir, f'vec_{i}.npy'), vec)\n", | |
" np.save(os.path.join(data_dir, f'out_{i}.npy'), out)\n", | |
" return data_dir" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "8f25709a", | |
"metadata": {}, | |
"source": [ | |
"Let us quickly check this has worked and list the first files of the temp data directory and also load the first image and display it:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "46797cb6", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"directory: ['img_7.jpeg', 'img_6.jpeg', 'out_9.npy', 'out_8.npy', 'img_1.jpeg', 'vec_9.npy', 'vec_8.npy', 'img_0.jpeg', 'img_3.jpeg', 'vec_3.npy']\n", | |
"data vector: [0.09562027 0.13416084 0.78447169 0.6295271 0.06296664 0.0999204\n", | |
" 0.91088476 0.94463629 0.36735917 0.56839703 0.13517639 0.99000634\n", | |
" 0.92912137 0.75876227 0.12211398 0.7096108 0.59581825 0.54461084\n", | |
" 0.98454203 0.2267237 0.57034512 0.90338819 0.7080705 0.12108354\n", | |
" 0.71391308 0.27990066 0.51705003 0.23727133 0.02399753 0.8657282\n", | |
" 0.17182173 0.89519481 0.77117444 0.03838287 0.47322742 0.35644288\n", | |
" 0.28834841 0.27750275 0.44477416 0.99100139 0.07113278 0.43698617\n", | |
" 0.12735372 0.84035908 0.32383501 0.5611986 0.26441979 0.39501645\n", | |
" 0.7147646 0.17376251 0.13057631 0.30693779 0.73991787 0.10479883\n", | |
" 0.91500724 0.78472397 0.87220087 0.6519156 0.29958762 0.29657736\n", | |
" 0.06031742 0.64146562 0.86448017 0.71434681]\n" | |
] | |
} | |
], | |
"source": [ | |
"tmp_data_dir = create_data_on_filesystem()\n", | |
"print(f'directory: {os.listdir(tmp_data_dir)[:10]}')\n", | |
"vec = np.load(os.path.join(tmp_data_dir, 'vec_1.npy'))\n", | |
"print(f'data vector: {vec}')\n", | |
"img = Image.open(os.path.join(tmp_data_dir, 'img_1.jpeg'))\n", | |
"img.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "7c59ca51", | |
"metadata": {}, | |
"source": [ | |
"If this shows `directory: ['img_7.jpeg', 'img_6.jpeg',...]`, `data vector: [0.99923793 0.72673214, ...`, and some random image, then we are ready to proceed to the next step. Your data vector will of course have different numbers, because of the random intialisation, but that is not important." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "06393357", | |
"metadata": {}, | |
"source": [ | |
"## Creating the data generator\n", | |
"This is the core part of the data pipeline. The data generator connects the data to the tensorflow data pipeline. It works as a translator between the format of your data and the input that the tensorflow dataset expects. \n", | |
"\n", | |
"Our generator here is therefore adapted to the structure of the data that we have created above. A flat directory where all input and output data is labelled through the file names that stores the data. \n", | |
"\n", | |
"There tensorflow interface that we will be using is `tf.data.Dataset.from_generator`. Note, this interface does not require a generator, an iterator is sufficient for it to work. However, working with generators is usually a bit easier than with iterators, and therefore we show this method here. \n", | |
"\n", | |
"A generator is simply a function that returns using the `yield` statement intead of `return`. In order to couple our input data with the `__iter__` protocol we create a class with such a member function that returns via `yield`. This is all that is needed. \n", | |
"\n", | |
"The advanced version of the generator that we are using here is not simply returning an `(X, y)`, where `X` is the input and `y` is the output data. This generator interface does not work, because the model takes multiple input arrays (two in our case), and retuns multiple output arrays (two one-dimensional ones in our case). Because of that, we have to work with dictionaries in the generator.\n", | |
"\n", | |
"Because the tensorflow interface also needs to know the TensorShape of the model's inputs and outputs as well as their data types, we also implement the corresponding functions in our class. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"id": "37fba19e", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"class PipelineGenerator:\n", | |
" def __init__(self, data_dir, num_samples=10):\n", | |
" self.data_dir = data_dir\n", | |
" self.num_samples = num_samples\n", | |
"\n", | |
" def __iter__(self):\n", | |
" \"\"\" A method that yields defines a generator, so fulfills the iterator\n", | |
" protocol. When i exceeds the number of samples a StopIteration is\n", | |
" raised from the iterable protocol of the for loop.\n", | |
"\n", | |
" :return: a tuple of the input and output dictionary\n", | |
" \"\"\"\n", | |
" for i in range(self.num_samples):\n", | |
" img = Image.open(os.path.join(self.data_dir, f'img_{i}.jpeg'))\n", | |
" vec = np.load(os.path.join(self.data_dir, f'vec_{i}.npy'))\n", | |
" out = np.load(os.path.join(self.data_dir, f'out_{i}.npy'))\n", | |
" # image data needs to be converted to [0.1] float values\n", | |
" img_norm = np.array(img).astype(np.float64) / 255\n", | |
" # it is super important that the dictionary keys match the names\n", | |
" # of the input/output layers in the Keras model, otherwise the\n", | |
" # generator will not work\n", | |
" x_dict = {'img_in': img_norm, 'vec_in': vec}\n", | |
" y_dict = {'out_1': out[0], 'out_2': out[1]}\n", | |
" yield x_dict, y_dict\n", | |
"\n", | |
" def output_shapes(self):\n", | |
" \"\"\" This method tells keras the input/output shapes to be expected\n", | |
" from the generator.\n", | |
"\n", | |
" :return: a tuple of dictionaries with the input / output\n", | |
" tensor shapes of the generator\n", | |
" \"\"\"\n", | |
" # like above, the dictionary keys here need to match the name of the\n", | |
" # layers in the keras model\n", | |
" shapes = ({'img_in': tf.TensorShape(input_1_shape),\n", | |
" 'vec_in': tf.TensorShape(input_2_shape)},\n", | |
" {'out_1': tf.TensorShape([]),\n", | |
" 'out_2': tf.TensorShape([])})\n", | |
" return shapes\n", | |
"\n", | |
" def output_types(self):\n", | |
" \"\"\" Used in tf.data, assume all types are doubles \n", | |
" \n", | |
" :return: a tuple of dictionaries of data types for input / output\n", | |
" \"\"\"\n", | |
" shapes = self.output_shapes()\n", | |
" types = tuple({k: tf.float64 for k in d} for d in shapes)\n", | |
" return types" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "8a7a935b", | |
"metadata": {}, | |
"source": [ | |
"This really is the tricky bit. As you see in the comments above, the generator will only work, if the input and output layer names of the model match the keys in the dictioaries returned by the methods `output_shapes()` and `output_types()`. In addition, note the different TensorShapes that are assigned to the data. For the inputs, the image of shape `(240, 240, 3)` and the vector of shape of `(64, )`, we are passing these shapes into the `TensorShape`. But for the output layers which both return a scalar number we are not passing `(1,)`, because our generator yields just a float number, so we are passing `[]` to the `TensorShape` constructor. A bit opaque, but we now have all the options covered, Tensors of rank 0, 1, and > 1. \n", | |
"\n", | |
"_Note: There is no need to put the `output_shapes()` and `output_types()` as member functions into the generator class as they could be static and do not use any of the class members. Actually this dimensionality information rather belongs to the model. But we have not created a data structure around the model, therefore the functions are members of the generator. If you see how the generator is called below, this might make a bit more sense._\n", | |
"\n", | |
"We can now go to the next step and train the model." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "182cb859", | |
"metadata": {}, | |
"source": [ | |
"## Training the model\n", | |
"Now we are bringing everything together. We create the training data and our data generator that processes this data. The size of our dataset is kept small to keep the notebook running fast. In the real world application, your dataset will be much larger, otherwise the whole approach of using the data generator is not needed. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"id": "78c1f5b1", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# the size of our data set\n", | |
"num_samples = 120 \n", | |
"data_dir = create_data_on_filesystem(num_samples)\n", | |
"pipeline_generator = PipelineGenerator(data_dir, num_samples)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b52222d2", | |
"metadata": {}, | |
"source": [ | |
"Next we create the tf Dataset from the generator. First we create the dataset than can cycle through all the data, one by one. Then in the second command we batch up the data into minibatches of our batch size by calling `batch()`. This means that each call to our dataset will produce a complete batch that is successively filled with the underlying data set.\n", | |
"\n", | |
"Then we use the performance switches of tensorflow to make the data generation fast. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"id": "61bea87e", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"batch_size = 8\n", | |
"dataset = tf.data.Dataset.from_generator(\n", | |
" generator=lambda: pipeline_generator,\n", | |
" output_types=pipeline_generator.output_types(),\n", | |
" output_shapes=pipeline_generator.output_shapes())\n", | |
"tf_data = dataset.batch(batch_size)\n", | |
"\n", | |
"# use tf magic to improve dataset prefetch\n", | |
"tune = tf.data.experimental.AUTOTUNE\n", | |
"tf_data_tuned = tf_data.prefetch(tune)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "ad05bc6a", | |
"metadata": {}, | |
"source": [ | |
"You might have recognised that the creation of all the data pipeline objects has been almost instantaneously. That is by construction. Until this point, no single bit of data has been read. The reading will only start within the training loop. \n", | |
"\n", | |
"We got to the very last step now. Let's train the model." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"id": "4bd3cc7a", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Epoch 1/20\n", | |
"15/15 [==============================] - 3s 135ms/step - loss: 0.1638 - out_1_loss: 0.0814 - out_2_loss: 0.0825\n", | |
"Epoch 2/20\n", | |
"15/15 [==============================] - 2s 129ms/step - loss: 0.1590 - out_1_loss: 0.0775 - out_2_loss: 0.0815\n", | |
"Epoch 3/20\n", | |
"15/15 [==============================] - 2s 145ms/step - loss: 0.1557 - out_1_loss: 0.0756 - out_2_loss: 0.0801\n", | |
"Epoch 4/20\n", | |
"15/15 [==============================] - 2s 128ms/step - loss: 0.1502 - out_1_loss: 0.0720 - out_2_loss: 0.0783\n", | |
"Epoch 5/20\n", | |
"15/15 [==============================] - 2s 131ms/step - loss: 0.1367 - out_1_loss: 0.0613 - out_2_loss: 0.0754\n", | |
"Epoch 6/20\n", | |
"15/15 [==============================] - 2s 112ms/step - loss: 0.0919 - out_1_loss: 0.0270 - out_2_loss: 0.0648\n", | |
"Epoch 7/20\n", | |
"15/15 [==============================] - 2s 108ms/step - loss: 0.0478 - out_1_loss: 0.0099 - out_2_loss: 0.0379\n", | |
"Epoch 8/20\n", | |
"15/15 [==============================] - 1s 79ms/step - loss: 0.0287 - out_1_loss: 0.0132 - out_2_loss: 0.0155\n", | |
"Epoch 9/20\n", | |
"15/15 [==============================] - 2s 99ms/step - loss: 0.0343 - out_1_loss: 0.0190 - out_2_loss: 0.0153\n", | |
"Epoch 10/20\n", | |
"15/15 [==============================] - 1s 78ms/step - loss: 0.0304 - out_1_loss: 0.0156 - out_2_loss: 0.0148\n", | |
"Epoch 11/20\n", | |
"15/15 [==============================] - 2s 109ms/step - loss: 0.0304 - out_1_loss: 0.0161 - out_2_loss: 0.0143\n", | |
"Epoch 12/20\n", | |
"15/15 [==============================] - 1s 93ms/step - loss: 0.0241 - out_1_loss: 0.0165 - out_2_loss: 0.0076\n", | |
"Epoch 13/20\n", | |
"15/15 [==============================] - 1s 92ms/step - loss: 0.0075 - out_1_loss: 0.0032 - out_2_loss: 0.0043\n", | |
"Epoch 14/20\n", | |
"15/15 [==============================] - 1s 80ms/step - loss: 0.0064 - out_1_loss: 0.0033 - out_2_loss: 0.0031\n", | |
"Epoch 15/20\n", | |
"15/15 [==============================] - 1s 78ms/step - loss: 0.0125 - out_1_loss: 0.0088 - out_2_loss: 0.0037\n", | |
"Epoch 16/20\n", | |
"15/15 [==============================] - 1s 82ms/step - loss: 0.0198 - out_1_loss: 0.0134 - out_2_loss: 0.0064\n", | |
"Epoch 17/20\n", | |
"15/15 [==============================] - 1s 81ms/step - loss: 0.0126 - out_1_loss: 0.0069 - out_2_loss: 0.0057\n", | |
"Epoch 18/20\n", | |
"15/15 [==============================] - 1s 79ms/step - loss: 0.0114 - out_1_loss: 0.0050 - out_2_loss: 0.0064\n", | |
"Epoch 19/20\n", | |
"15/15 [==============================] - 1s 80ms/step - loss: 0.0143 - out_1_loss: 0.0093 - out_2_loss: 0.0051\n", | |
"Epoch 20/20\n", | |
"15/15 [==============================] - 1s 80ms/step - loss: 0.0107 - out_1_loss: 0.0044 - out_2_loss: 0.0062\n" | |
] | |
} | |
], | |
"source": [ | |
"epochs = 20\n", | |
"model.compile(optimizer='adam', loss='mse')\n", | |
"history = model.fit(tf_data_tuned, epochs=epochs)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b7c2f209", | |
"metadata": {}, | |
"source": [ | |
"Done. The training has finished - let's see what has happend:\n", | |
"1. In each epoch the tf dataset has provided 15 minibatches of size 8\n", | |
"1. In general, this means only 8 data entries `(X, y)` had to be kept in memory at each point in time. If you train on a GPU, then you can optimise the batch size to utilise the GPU's memory as much as possible. \n", | |
"1. Here we also used `prefetch` so the next batches can get loaded already in the background while the current batch is being evaluated in the training. When training on the CPU, this will increase the memory of the process. But if training happens on the GPU, the prefetching will happen on the CPU using the computer ram and only the training data for a single batch will be stored into the GPU's ram. Hence the GPU should have to wait less for the next batch to get loaded. This is where the speed up happens.\n", | |
"\n", | |
"## Is this the full story about training with a generator?\n", | |
"It is not. There are a number of issues that we have silently ignored. What are these?\n", | |
"\n", | |
"1. We don't have a validation dataset. Ok, this might be easy we think, just pass `validation_split` into the `fit()` function above. The problem is, this argument does only work if you pass the data as numpy arrays, but not as dictionaries.\n", | |
"\n", | |
"1. We have not shuffled the data. Our data above has been randomly generated so shuffling is not required here. But with real world data, it is very likely the data has ordering or sorting. But each mini batch should be a representative of the whole dataset. And the validation dataset and training dataset should be from the same distribution of data, too. \n", | |
"\n", | |
"1. This is not a must, but if we want to optimise performance and the bottleneck is the loading of data from the filesystem, then keeping the input vector and output results in two different files is not optimal because we need to open and close two files.\n", | |
"\n", | |
"1. This one is not a must either: The genarator provides the performance boost if we have such a large dataset that keeping it in memory is not possible. But what, if one time we want to to train on a smaller dataset that fits into memory. Wouldn't it be faster, to keep the data in ram than using the generator that reads from disk all the time? It would. So probably we want to add a switch that uses the generator, but the generated data can be cached into memory.\n", | |
"\n", | |
"Here is the plan to address all four items from above:\n", | |
"1. The `fit()` function supports passing validation data as a seperate argument, hence we will create a second data generator for the validation data.\n", | |
"1. We shuffle the data before the generator is being called, so the output of the generator will be randomised already. \n", | |
"1. All the non-image data can be stored in a json file, and will be read only once, as this data is small.\n", | |
"1. The generator will allow an internal caching of the image data, and small datasets can be kept completely in memory.\n", | |
"\n", | |
"## The complete training pipeline, with shuffling, validation and caching\n", | |
"\n", | |
"### The improved data storage\n", | |
"We collect all non-image data into a json file, here this is our input vector and output data:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"id": "06430992", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from random import shuffle\n", | |
"import json\n", | |
"\n", | |
"def create_structured_data_on_filesystem(num_samples=10):\n", | |
" data_dir = tempfile.mkdtemp()\n", | |
" for i in range(num_samples):\n", | |
" img_arr = np.random.randint(0, 255, input_1_shape, int).astype(np.uint8)\n", | |
" vec = np.random.uniform(0, 1, input_2_shape)\n", | |
" out = np.random.uniform(0, 1, (2,))\n", | |
" img = Image.fromarray(img_arr)\n", | |
" img.save(os.path.join(data_dir, f'img_{i}.jpeg'))\n", | |
" d = dict(index=i, vec=vec.tolist(), out=out.tolist())\n", | |
" with open(os.path.join(data_dir, f'record_{i}.json'), 'w') as f:\n", | |
" json.dump(d, f)\n", | |
" return data_dir\n", | |
"\n", | |
"def get_record_from_filesystem(data_dir, i):\n", | |
" with open(os.path.join(data_dir, f'record_{i}.json'), 'r') as f:\n", | |
" return json.load(f)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "dd9dd3a2", | |
"metadata": {}, | |
"source": [ | |
"We also create a data type that keeps all data for a single `(X, y)` in one place:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"id": "41152b80", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"class SmartRecord:\n", | |
" def __init__(self, record, cache=False):\n", | |
" self.record = record\n", | |
" self.cache = cache\n", | |
" self.img_arr = None\n", | |
"\n", | |
" def image(self, path):\n", | |
" if self.img_arr is not None:\n", | |
" img_arr = self.img_arr\n", | |
" else:\n", | |
" i = self.record['index']\n", | |
" img = Image.open(os.path.join(path, f'img_{i}.jpeg'))\n", | |
" img_arr = np.array(img)\n", | |
" if self.cache:\n", | |
" self.img_arr = img_arr\n", | |
" # image data needs to be converted to [0.1] float values\n", | |
" return img_arr.astype(np.float64) / 255" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "714881d3", | |
"metadata": {}, | |
"source": [ | |
"### The improved generator\n", | |
"This is similar to the above, but with two differences:\n", | |
"1. We add a method to produce a training and validation set of indexes which are also randomised.\n", | |
"1. We store an iterable (i.e a list) of `SmartRecord`s as a member" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"id": "190d8f01", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def train_test_split(num_samples, validation_size):\n", | |
" split_idx = int((1 - validation_size) * num_samples)\n", | |
" indexes = list(range(num_samples))\n", | |
" shuffle(indexes)\n", | |
" return indexes[:split_idx], indexes[split_idx:]\n", | |
"\n", | |
"\n", | |
"class PipelineGeneratorFromRecords:\n", | |
" def __init__(self, data_dir, records):\n", | |
" self.data_dir = data_dir\n", | |
" self.records = records\n", | |
"\n", | |
" def __iter__(self):\n", | |
" \"\"\" A method that yields defines a generator, so fulfills the iterator\n", | |
" protocol. When i exceeds the number of samples a StopIteration is\n", | |
" raised from the iterable protocol of the for loop.\n", | |
"\n", | |
" :return: a tuple of the input and output dictionary\n", | |
" \"\"\"\n", | |
" for smart_record in self.records:\n", | |
" img = smart_record.image(os.path.join(self.data_dir))\n", | |
" vec = smart_record.record['vec']\n", | |
" out = smart_record.record['out']\n", | |
" # it is super important that the dictionary keys match the names\n", | |
" # of the input/output layers in the Keras model, otherwise the\n", | |
" # generator will not work\n", | |
" x_dict = {'img_in': img, 'vec_in': vec}\n", | |
" y_dict = {'out_1': out[0], 'out_2': out[1]}\n", | |
" yield x_dict, y_dict\n", | |
"\n", | |
" def output_shapes(self):\n", | |
" \"\"\" This method tells keras the input/output shapes to be expected\n", | |
" from the generator.\n", | |
"\n", | |
" :return: a tuple of dictionaries with the input / output\n", | |
" tensor shapes of the generator\n", | |
" \"\"\"\n", | |
" # like above, the dictionary keys here need to match the name of the\n", | |
" # layers in the keras model\n", | |
" shapes = ({'img_in': tf.TensorShape(input_1_shape),\n", | |
" 'vec_in': tf.TensorShape(input_2_shape)},\n", | |
" {'out_1': tf.TensorShape([]),\n", | |
" 'out_2': tf.TensorShape([])})\n", | |
" return shapes\n", | |
"\n", | |
" def output_types(self):\n", | |
" \"\"\" Used in tf.data, assume all types are doubles\"\"\"\n", | |
" shapes = self.output_shapes()\n", | |
" types = tuple({k: tf.float64 for k in d} for d in shapes)\n", | |
" return types" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "081e8448", | |
"metadata": {}, | |
"source": [ | |
"Because we want to create a training and a validation dataset, we put all of the logic that creates the dataset into a function that will be called twice later on. We don't use this feature explicitly, but where we create the 'SmartRecord' we can pass a flag to use image caching in memory or not. Here, caching is disabled by default." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"id": "21ff5291", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def create_tf_data(index_list, batch_size):\n", | |
" records = [SmartRecord(get_record_from_filesystem(data_dir, idx)) for\n", | |
" idx in index_list]\n", | |
" pipeline_generator = PipelineGeneratorFromRecords(data_dir, records)\n", | |
" # create the tf datasets from the generator\n", | |
" dataset = tf.data.Dataset.from_generator(\n", | |
" generator=lambda: pipeline_generator,\n", | |
" output_types=pipeline_generator.output_types(),\n", | |
" output_shapes=pipeline_generator.output_shapes())\n", | |
" tf_data = dataset.batch(batch_size)\n", | |
" # use tf magic to improve dataset prefetch\n", | |
" tune = tf.data.experimental.AUTOTUNE\n", | |
" return tf_data.prefetch(tune)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "c92efc5a", | |
"metadata": {}, | |
"source": [ | |
"### Creating the model\n", | |
"We are re-creating the model so trainging will start from scratch." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"id": "31173259", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Model: \"model\"\n", | |
"__________________________________________________________________________________________________\n", | |
" Layer (type) Output Shape Param # Connected to \n", | |
"==================================================================================================\n", | |
" img_in (InputLayer) [(None, 240, 240, 3 0 [] \n", | |
" )] \n", | |
" \n", | |
" conv0 (Conv2D) (None, 120, 120, 24 672 ['img_in[0][0]'] \n", | |
" ) \n", | |
" \n", | |
" conv1 (Conv2D) (None, 60, 60, 32) 6944 ['conv0[0][0]'] \n", | |
" \n", | |
" conv2 (Conv2D) (None, 30, 30, 64) 18496 ['conv1[0][0]'] \n", | |
" \n", | |
" conv3 (Conv2D) (None, 15, 15, 64) 36928 ['conv2[0][0]'] \n", | |
" \n", | |
" conv4 (Conv2D) (None, 8, 8, 64) 36928 ['conv3[0][0]'] \n", | |
" \n", | |
" flatten (Flatten) (None, 4096) 0 ['conv4[0][0]'] \n", | |
" \n", | |
" vec_in (InputLayer) [(None, 64)] 0 [] \n", | |
" \n", | |
" tf.concat (TFOpLambda) (None, 4160) 0 ['flatten[0][0]', \n", | |
" 'vec_in[0][0]'] \n", | |
" \n", | |
" dense_1 (Dense) (None, 100) 416100 ['tf.concat[0][0]'] \n", | |
" \n", | |
" dense_2 (Dense) (None, 50) 5050 ['dense_1[0][0]'] \n", | |
" \n", | |
" out_1 (Dense) (None, 1) 51 ['dense_2[0][0]'] \n", | |
" \n", | |
" out_2 (Dense) (None, 1) 51 ['dense_2[0][0]'] \n", | |
" \n", | |
"==================================================================================================\n", | |
"Total params: 521,220\n", | |
"Trainable params: 521,220\n", | |
"Non-trainable params: 0\n", | |
"__________________________________________________________________________________________________\n", | |
"None\n" | |
] | |
} | |
], | |
"source": [ | |
"# create the model and print summary\n", | |
"model_2 = create_model()\n", | |
"print(model.summary())" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "759b15a6", | |
"metadata": {}, | |
"source": [ | |
"### Creating the improved datasets\n", | |
"Now we produce two datasets, one for training and one for validation." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"id": "d9c34bc9", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# create the tmp data directory w/ data and inititalise the generator\n", | |
"num_samples = 120\n", | |
"data_dir = create_structured_data_on_filesystem(num_samples)\n", | |
"\n", | |
"# create train and validation data lists and datasets from the lists\n", | |
"train_idx, val_idx = train_test_split(num_samples, 0.2)\n", | |
"batch_size = 8\n", | |
"train_dataset = create_tf_data(train_idx, batch_size)\n", | |
"val_dataset = create_tf_data(val_idx, batch_size)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "6541b85b", | |
"metadata": {}, | |
"source": [ | |
"### Fitting the model\n", | |
"The training is like above but with the following improvements:\n", | |
"1. We use a better stopping criterion than just running down the epochs. Instead we stop the training when the validation error does not improve any longer - this is the `patience` argument in the callback.\n", | |
"1. We are saving the model everytime the validation loss is getting smaller. Therefore the last saved model will have the smallest variance.\n", | |
"\n", | |
"These training features are implemented through Keras callbacks. The model will be saved into the temp data directory." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"id": "fe108334", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Epoch 1/200\n", | |
" 12/Unknown - 2s 95ms/step - loss: 0.1769 - out_1_loss: 0.0917 - out_2_loss: 0.0851\n", | |
"Epoch 00001: val_loss improved from inf to 0.13522, saving model to /var/folders/8r/zhtp839917b63868yn6tvj8w0000gs/T/tmpsa781rrz/model.h5\n", | |
"12/12 [==============================] - 3s 148ms/step - loss: 0.1769 - out_1_loss: 0.0917 - out_2_loss: 0.0851 - val_loss: 0.1352 - val_out_1_loss: 0.0667 - val_out_2_loss: 0.0685\n", | |
"Epoch 2/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.1703 - out_1_loss: 0.0885 - out_2_loss: 0.0818\n", | |
"Epoch 00002: val_loss improved from 0.13522 to 0.13515, saving model to /var/folders/8r/zhtp839917b63868yn6tvj8w0000gs/T/tmpsa781rrz/model.h5\n", | |
"12/12 [==============================] - 1s 107ms/step - loss: 0.1703 - out_1_loss: 0.0885 - out_2_loss: 0.0818 - val_loss: 0.1352 - val_out_1_loss: 0.0668 - val_out_2_loss: 0.0684\n", | |
"Epoch 3/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.1681 - out_1_loss: 0.0868 - out_2_loss: 0.0813\n", | |
"Epoch 00003: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 2s 154ms/step - loss: 0.1681 - out_1_loss: 0.0868 - out_2_loss: 0.0813 - val_loss: 0.1355 - val_out_1_loss: 0.0673 - val_out_2_loss: 0.0682\n", | |
"Epoch 4/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.1661 - out_1_loss: 0.0846 - out_2_loss: 0.0815\n", | |
"Epoch 00004: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 2s 140ms/step - loss: 0.1661 - out_1_loss: 0.0846 - out_2_loss: 0.0815 - val_loss: 0.1363 - val_out_1_loss: 0.0680 - val_out_2_loss: 0.0683\n", | |
"Epoch 5/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.1580 - out_1_loss: 0.0769 - out_2_loss: 0.0811\n", | |
"Epoch 00005: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 2s 136ms/step - loss: 0.1580 - out_1_loss: 0.0769 - out_2_loss: 0.0811 - val_loss: 0.1400 - val_out_1_loss: 0.0710 - val_out_2_loss: 0.0690\n", | |
"Epoch 6/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.1439 - out_1_loss: 0.0649 - out_2_loss: 0.0790\n", | |
"Epoch 00006: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 2s 130ms/step - loss: 0.1439 - out_1_loss: 0.0649 - out_2_loss: 0.0790 - val_loss: 0.1509 - val_out_1_loss: 0.0833 - val_out_2_loss: 0.0676\n", | |
"Epoch 7/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.1229 - out_1_loss: 0.0405 - out_2_loss: 0.0824\n", | |
"Epoch 00007: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 89ms/step - loss: 0.1229 - out_1_loss: 0.0405 - out_2_loss: 0.0824 - val_loss: 0.1454 - val_out_1_loss: 0.0769 - val_out_2_loss: 0.0685\n", | |
"Epoch 8/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0970 - out_1_loss: 0.0198 - out_2_loss: 0.0772\n", | |
"Epoch 00008: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 94ms/step - loss: 0.0970 - out_1_loss: 0.0198 - out_2_loss: 0.0772 - val_loss: 0.1552 - val_out_1_loss: 0.0847 - val_out_2_loss: 0.0705\n", | |
"Epoch 9/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0825 - out_1_loss: 0.0130 - out_2_loss: 0.0695\n", | |
"Epoch 00009: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 92ms/step - loss: 0.0825 - out_1_loss: 0.0130 - out_2_loss: 0.0695 - val_loss: 0.1645 - val_out_1_loss: 0.0829 - val_out_2_loss: 0.0815\n", | |
"Epoch 10/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0696 - out_1_loss: 0.0061 - out_2_loss: 0.0635\n", | |
"Epoch 00010: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 94ms/step - loss: 0.0696 - out_1_loss: 0.0061 - out_2_loss: 0.0635 - val_loss: 0.1625 - val_out_1_loss: 0.0847 - val_out_2_loss: 0.0778\n", | |
"Epoch 11/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0476 - out_1_loss: 0.0038 - out_2_loss: 0.0438\n", | |
"Epoch 00011: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 2s 148ms/step - loss: 0.0476 - out_1_loss: 0.0038 - out_2_loss: 0.0438 - val_loss: 0.1873 - val_out_1_loss: 0.1091 - val_out_2_loss: 0.0782\n", | |
"Epoch 12/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0246 - out_1_loss: 0.0051 - out_2_loss: 0.0195\n", | |
"Epoch 00012: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 92ms/step - loss: 0.0246 - out_1_loss: 0.0051 - out_2_loss: 0.0195 - val_loss: 0.2198 - val_out_1_loss: 0.1377 - val_out_2_loss: 0.0821\n", | |
"Epoch 13/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0126 - out_1_loss: 0.0072 - out_2_loss: 0.0054\n", | |
"Epoch 00013: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 93ms/step - loss: 0.0126 - out_1_loss: 0.0072 - out_2_loss: 0.0054 - val_loss: 0.2154 - val_out_1_loss: 0.0924 - val_out_2_loss: 0.1230\n", | |
"Epoch 14/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0272 - out_1_loss: 0.0131 - out_2_loss: 0.0141\n", | |
"Epoch 00014: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 98ms/step - loss: 0.0272 - out_1_loss: 0.0131 - out_2_loss: 0.0141 - val_loss: 0.2000 - val_out_1_loss: 0.1010 - val_out_2_loss: 0.0990\n", | |
"Epoch 15/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0172 - out_1_loss: 0.0055 - out_2_loss: 0.0117\n", | |
"Epoch 00015: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 97ms/step - loss: 0.0172 - out_1_loss: 0.0055 - out_2_loss: 0.0117 - val_loss: 0.2115 - val_out_1_loss: 0.0919 - val_out_2_loss: 0.1196\n", | |
"Epoch 16/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0183 - out_1_loss: 0.0082 - out_2_loss: 0.0101\n", | |
"Epoch 00016: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 96ms/step - loss: 0.0183 - out_1_loss: 0.0082 - out_2_loss: 0.0101 - val_loss: 0.1668 - val_out_1_loss: 0.0810 - val_out_2_loss: 0.0858\n", | |
"Epoch 17/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0244 - out_1_loss: 0.0069 - out_2_loss: 0.0175\n", | |
"Epoch 00017: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 122ms/step - loss: 0.0244 - out_1_loss: 0.0069 - out_2_loss: 0.0175 - val_loss: 0.1923 - val_out_1_loss: 0.0862 - val_out_2_loss: 0.1061\n", | |
"Epoch 18/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0211 - out_1_loss: 0.0093 - out_2_loss: 0.0118\n", | |
"Epoch 00018: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 101ms/step - loss: 0.0211 - out_1_loss: 0.0093 - out_2_loss: 0.0118 - val_loss: 0.1798 - val_out_1_loss: 0.0939 - val_out_2_loss: 0.0859\n", | |
"Epoch 19/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0239 - out_1_loss: 0.0095 - out_2_loss: 0.0144\n", | |
"Epoch 00019: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 99ms/step - loss: 0.0239 - out_1_loss: 0.0095 - out_2_loss: 0.0144 - val_loss: 0.2267 - val_out_1_loss: 0.1431 - val_out_2_loss: 0.0835\n", | |
"Epoch 20/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0307 - out_1_loss: 0.0212 - out_2_loss: 0.0096\n", | |
"Epoch 00020: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 96ms/step - loss: 0.0307 - out_1_loss: 0.0212 - out_2_loss: 0.0096 - val_loss: 0.2333 - val_out_1_loss: 0.1587 - val_out_2_loss: 0.0746\n", | |
"Epoch 21/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0259 - out_1_loss: 0.0138 - out_2_loss: 0.0122\n", | |
"Epoch 00021: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 98ms/step - loss: 0.0259 - out_1_loss: 0.0138 - out_2_loss: 0.0122 - val_loss: 0.1627 - val_out_1_loss: 0.0826 - val_out_2_loss: 0.0801\n", | |
"Epoch 22/200\n", | |
"12/12 [==============================] - ETA: 0s - loss: 0.0236 - out_1_loss: 0.0160 - out_2_loss: 0.0076\n", | |
"Epoch 00022: val_loss did not improve from 0.13515\n", | |
"12/12 [==============================] - 1s 104ms/step - loss: 0.0236 - out_1_loss: 0.0160 - out_2_loss: 0.0076 - val_loss: 0.1790 - val_out_1_loss: 0.1022 - val_out_2_loss: 0.0768\n" | |
] | |
} | |
], | |
"source": [ | |
"from tensorflow.python.keras.callbacks import EarlyStopping, ModelCheckpoint\n", | |
"epochs = 200\n", | |
"model_2.compile(optimizer='adam', loss='mse')\n", | |
"callbacks = [EarlyStopping(monitor='val_loss', patience=20, min_delta=1e-5),\n", | |
" ModelCheckpoint(monitor='val_loss',\n", | |
" filepath=os.path.join(data_dir, \"model.h5\"),\n", | |
" save_best_only=True,\n", | |
" verbose=1)]\n", | |
"history = model_2.fit(x=train_dataset, validation_data=val_dataset,\n", | |
" epochs=epochs, callbacks=callbacks)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "1ce896f1", | |
"metadata": {}, | |
"source": [ | |
"Done. The training has finished, so what has changed compared to the training above:\n", | |
"1. We have set the number of epochs to 200, but training has stopped much earlier. This is caused by our early stopping criterion. After 20 consecutive epochs in which the validation error did not decrease, the training stopped.\n", | |
"1. The model with the parameters (or weights) of that smallest validation error is now saved to disk. You can see the path in `saving model to /var/folders/8r/zhtp839917b63868yn6tvj8w0000gs/T/tmp7ckhh2_s/model.h5` everytime the model gets written to disk. Obviously your path will look different.\n", | |
"1. If you used the model that is still kept in memory it will not have the optimal weights, because it was running for 20 more epochs and has continued updating the weights in each epoch. Hence, in order to use the optimal model, you have to re-load it from disk." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"id": "81212018", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from tensorflow.keras.models import load_model\n", | |
"optimal_model = load_model(filepath=os.path.join(data_dir, \"model.h5\"))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "1dc2effe", | |
"metadata": {}, | |
"source": [ | |
"Here we go - this is the model with the lowest variance trained with a generator. It would allow you to run on huge data, as much as you can fit on your disk.\n", | |
"\n", | |
"## Further reading and references\n", | |
"The available dataset optimisations available in tensorflow are documented here: \n", | |
"* https://www.tensorflow.org/guide/data_performance\n", | |
"\n", | |
"The Donkey Car project can be found here:\n", | |
"* https://github.com/autorope/donkeycar\n", | |
"* https://docs.donkeycar.com\n", | |
"* https://www.donkeycar.com" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "df10be8b", | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.9.7" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment