Hemant Vishwakarma: How to efficiently run multiple Pytorch Processes / Models at once ? Traceback: The paging file is too small for this operation to complete

Tuesday, 24 November 2020

How to efficiently run multiple Pytorch Processes / Models at once ? Traceback: The paging file is too small for this operation to complete

Background

I have a very small network which I want to test with different random seeds. The network barely uses 1% of my GPUs compute power so i could in theory run 50 processes at once to try many different seeds at once.

Problem

Unfortunately i can't even import pytorch in multiple processes. When the nr of processes exceeds 4 I get a Traceback regarding a too small paging file.

Minimal reproducable code§ - dispatcher.py

from subprocess import Popen
import sys

procs = []
for seed in range(50):
    procs.append(Popen([sys.executable, "ml_model.py", str(seed)]))

for proc in procs:
    proc.wait()

§I increased the number of seeds so people with better machines can also reproduce this.

Minimal reproducable code - ml_model.py

import torch
import time
time.sleep(10)

 
 Traceback (most recent call last):
  File "ml_model.py", line 1, in <module>
    import torch
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\__init__.py", line 117, in <module>
    import torch
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\__init__.py", line 117, in <module>
    raise err
 OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.
    raise err

Further Investigation

I noticed that each process loads a lot of dll's into RAM. And when i close all other programs which use a lot of RAM i can get up to 10 procesess instead of 4. So it seems like a resource constraint.

Questions

Is there a workaround ?

What's the recommended way to train many small networks with pytorch on a single gpu ?

Should i write my own CUDA Kernel instead, or use a different framework to achieve this ?

My goal would be to run around 50 processes at once (on a 16GB RAM Machine, 8GB GPU RAM)

from How to efficiently run multiple Pytorch Processes / Models at once ? Traceback: The paging file is too small for this operation to complete

Hemant Vishwakarma

Tuesday, 24 November 2020

How to efficiently run multiple Pytorch Processes / Models at once ? Traceback: The paging file is too small for this operation to complete

No comments:

Post a Comment