Created
October 16, 2020 00:17
-
-
Save jamesmishra/34bac09176bc07b1f0c33886e4b19dc7 to your computer and use it in GitHub Desktop.
Calculating Keras model memory usage
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def keras_model_memory_usage_in_bytes(model, *, batch_size: int): | |
""" | |
Return the estimated memory usage of a given Keras model in bytes. | |
This includes the model weights and layers, but excludes the dataset. | |
The model shapes are multipled by the batch size, but the weights are not. | |
Args: | |
model: A Keras model. | |
batch_size: The batch size you intend to run the model with. If you | |
have already specified the batch size in the model itself, then | |
pass `1` as the argument here. | |
Returns: | |
An estimate of the Keras model's memory usage in bytes. | |
""" | |
default_dtype = tf.keras.backend.floatx() | |
shapes_mem_count = 0 | |
internal_model_mem_count = 0 | |
for layer in model.layers: | |
if isinstance(layer, tf.keras.Model): | |
internal_model_mem_count += keras_model_memory_usage_in_bytes( | |
layer, batch_size=batch_size | |
) | |
single_layer_mem = tf.as_dtype(layer.dtype or default_dtype).size | |
out_shape = layer.output_shape | |
if isinstance(out_shape, list): | |
out_shape = out_shape[0] | |
for s in out_shape: | |
if s is None: | |
continue | |
single_layer_mem *= s | |
shapes_mem_count += single_layer_mem | |
trainable_count = sum( | |
[tf.keras.backend.count_params(p) for p in model.trainable_weights] | |
) | |
non_trainable_count = sum( | |
[tf.keras.backend.count_params(p) for p in model.non_trainable_weights] | |
) | |
total_memory = ( | |
batch_size * shapes_mem_count | |
+ internal_model_mem_count | |
+ trainable_count | |
+ non_trainable_count | |
) | |
return total_memory |
However, I still find an underestimate to be useful. When I am automatically generating models during a hyperparameter search, I can skip over models that are 100% guaranteed to be too large for my GPUs.
Unfortunately, an underestimate doesn't meet the usage that I had in mind as I was interested in knowing whether a particular model would fit into a particular GPU (and what batch size would result in "optimal" memory usage). Since we have no real idea as to how much we are underestimating by it is impossible to answer this question.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@Bidski, I've come to the same conclusion since I last replied to you.
In general, any Keras layer can create an arbitrary amount of tensors in the layer's
__init__()
,build()
, andcall()
methods, These tensors will not appear in the layer's output shape, so mykeras_model_memory_usage_in_bytes()
will continually underestimate a model's actual memory usage.However, I still find an underestimate to be useful. When I am automatically generating models during a hyperparameter search, I can skip over models that are 100% guaranteed to be too large for my GPUs.