I have a Python server application, which provides TensorFlow / Keras model inference services. Multiple different such models can be loaded and used at the same time, for multiple different clients. A client can request to load another model, but this has no effect on the other clients (i.e. their models stay in memory and use as they are, so each client can ask to load another model regardless of the state of any other client).
The logic and implementation works, however, I am not sure how to correctly free memory in this setup. When a client asks for a new model to load, then the previously loaded model will simply be deleted from memory (via the Python del
command), then the new model is being loaded via tensorflow.keras.models.load_model()
.
From what I read in the Keras documentation one might want to clear a Keras session in order to free memory via calling tf.keras.backend.clear_session()
. However, that seems to release all TF memory, which is a problem in my case, since other Keras models for other clients are still in use at the same time, as described above.
So in other words: When loading a new TensorFlow / Keras model while other models are also in memory and in use, how can I free the TF memory from the previsouly loaded model, without interferring with the other currently loaded models?
from How to free TF/Keras memory in Python after a model has been deleted, while other models are still in memory and in use?
No comments:
Post a Comment