Friday, 1 July 2022

GUnicorn + CUDA: Cannot re-initialize CUDA in forked subprocess

I am creating an inference service with torch, gunicorn and flask that should use CUDA. To reduce resource requirements, I use the preload option of gunicorn, so the model is shared between the worker processes. However, this leads to an issue with CUDA. The following code snipped shows a minimal reproducing example:

from flask import Flask, request
import torch

app = Flask('dummy')

model = torch.rand(500)
model = model.to('cuda:0')


@app.route('/', methods=['POST'])
def f():
    data = request.get_json()
    x = torch.rand((data['number'], 500))
    x = x.to('cuda:0')
    res = x * model
    return {
        "result": res.sum().item()
    }

Starting the server with CUDA_VISIBLE_DEVICES=1 gunicorn -w 3 -b $HOST_IP:8080 --preload run_server:app lets the service start successfully. However, once doing the first request (curl -X POST -d '{"number": 1}'), the worker throws the following error:

[2022-06-28 09:42:00,378] ERROR in app: Exception on / [POST]
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/user/.local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/user/project/run_server.py", line 14, in f
    x = x.to('cuda:0')
  File "/home/user/.local/lib/python3.6/site-packages/torch/cuda/__init__.py", line 195, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I load the model in the parent process and it's accessible to each forked worker process. The problem occurs when creating a CUDA-backed tensor in the worker process. This re-initializes the CUDA context in the worker process, which fails because it was already initialized in the parent process. If we set x = data['number'] and remove x = x.to('cuda:0'), the inference succeeds.

Adding torch.multiprocessing.set_start_method('spawn') or multiprocessing.set_start_method('spawn') won't change anything, probably because gunicorn will definitely use fork when being started with the --preload option.

A solution could be not using the --preload option, which leads to multiple copies of the model in memory/GPU. But this is what I am trying to avoid.

Is there any possibility to overcome this issue without loading the model separately in each worker process?



from GUnicorn + CUDA: Cannot re-initialize CUDA in forked subprocess

No comments:

Post a Comment