I'm currently training some neural network models and I've found that for some reason the model will sometimes fail before ~200 iterations due to a runtime error, despite there being memory available. The error is:
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 10.76 GiB total capacity; 1.79 GiB already allocated; 3.44 MiB free; 9.76 GiB reserved in total by PyTorch)
Which shows how only ~1.8GB of RAM is being used when there should be 9.76GB available.
I have found that when I find a good seed (just by random searching), and the model gets past the first few hundred iterations, it will generally run fine afterwards. It seems as though the model doesn't have as much memory available very early on in training, but I don't know how to solve this.
from GPU Runtime Error when memory is available
No comments:
Post a Comment