Skip to content

Instantly share code, notes, and snippets.

@jamesmishra
Created October 16, 2020 00:17
Show Gist options
  • Save jamesmishra/34bac09176bc07b1f0c33886e4b19dc7 to your computer and use it in GitHub Desktop.
Save jamesmishra/34bac09176bc07b1f0c33886e4b19dc7 to your computer and use it in GitHub Desktop.
Calculating Keras model memory usage
def keras_model_memory_usage_in_bytes(model, *, batch_size: int):
"""
Return the estimated memory usage of a given Keras model in bytes.
This includes the model weights and layers, but excludes the dataset.
The model shapes are multipled by the batch size, but the weights are not.
Args:
model: A Keras model.
batch_size: The batch size you intend to run the model with. If you
have already specified the batch size in the model itself, then
pass `1` as the argument here.
Returns:
An estimate of the Keras model's memory usage in bytes.
"""
default_dtype = tf.keras.backend.floatx()
shapes_mem_count = 0
internal_model_mem_count = 0
for layer in model.layers:
if isinstance(layer, tf.keras.Model):
internal_model_mem_count += keras_model_memory_usage_in_bytes(
layer, batch_size=batch_size
)
single_layer_mem = tf.as_dtype(layer.dtype or default_dtype).size
out_shape = layer.output_shape
if isinstance(out_shape, list):
out_shape = out_shape[0]
for s in out_shape:
if s is None:
continue
single_layer_mem *= s
shapes_mem_count += single_layer_mem
trainable_count = sum(
[tf.keras.backend.count_params(p) for p in model.trainable_weights]
)
non_trainable_count = sum(
[tf.keras.backend.count_params(p) for p in model.non_trainable_weights]
)
total_memory = (
batch_size * shapes_mem_count
+ internal_model_mem_count
+ trainable_count
+ non_trainable_count
)
return total_memory
@Bidski
Copy link

Bidski commented Oct 4, 2021

Sorry for the delay, @Bidski. Are you building this model by directly subclassing tf.keras.Model?

Yes, direct subclassing of tf.keras.Model.

In these cases, it is likely that input and output shapes are not accurately computed until the model is called. This is probably why all of your output shapes are "multiple".

This may be a different issue entirely, but I have a different model that is also subclassed from tf.keras.Model. After loading in the trained model and printing the summary the output shapes are still listed as multiple (the model and all of its layers have been called as it is a fully trained model, but it was also just loaded from disk so its possible that this information isnt saved in the model).

see: tensorflow/tensorflow#29132 and tensorflow/tensorflow#25036

I may be missing something here, but it seems that both of these issues are basically saying that the "best" option is to call the network with dummy data and, hence, actually allocating memory for all of the layers?

If you have a way of calculating the shapes of every layer without actually allocating memory on the GPU, then we should be able to integrate that into my function in this Gist.

I mean, all of the layers are basically combinations of tensorflow ops (convolutions, dense layers, reshapes, etc) so it is easily possible to calculate all of the output shapes at instantiation time if you know the input shapes, but I dont think I ever found a simple way to manually compute the output shape for all layers (it has been a long time since I looked at this, so I may be wrong on this point).

@jamesmishra
Copy link
Author

[...] I dont think I ever found a simple way to manually compute the output shape for all layers (it has been a long time since I looked at this, so I may be wrong on this point).

@Bidski, I've come to the same conclusion since I last replied to you.

In general, any Keras layer can create an arbitrary amount of tensors in the layer's __init__(), build(), and call() methods, These tensors will not appear in the layer's output shape, so my keras_model_memory_usage_in_bytes() will continually underestimate a model's actual memory usage.

However, I still find an underestimate to be useful. When I am automatically generating models during a hyperparameter search, I can skip over models that are 100% guaranteed to be too large for my GPUs.

@Bidski
Copy link

Bidski commented Oct 6, 2021

However, I still find an underestimate to be useful. When I am automatically generating models during a hyperparameter search, I can skip over models that are 100% guaranteed to be too large for my GPUs.

Unfortunately, an underestimate doesn't meet the usage that I had in mind as I was interested in knowing whether a particular model would fit into a particular GPU (and what batch size would result in "optimal" memory usage). Since we have no real idea as to how much we are underestimating by it is impossible to answer this question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment