In macOS High Sierra (Version 10.13.6), I run a Python program that does the following:
- Launches a worker process that consumes data (URL strings) from a
multiprocessing.Queue
. - The worker process sends HTTP requests with the
requests
package, i.e., it makesrequests.get()
calls. - Some data (a URL string) is fed to the queue even before the worker process is started.
A program satisfying the above conditions leads to the worker process crashing with this error:
objc[24250]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[24250]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
I have read the following threads:
- Multiprocessing causes Python to crash and gives an error may have been in progress in another thread when fork() was called
- Requests module crashes python when numpy is loaded and using process
- Rails: may have been in progress in another thread when fork() was called
These threads focus on a workaround for the user. The workaround is defining this environment variable:
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
In this question, I would like to understand why only certain conditions reproduce the error whereas other conditions do not and how to resolve this issue without putting the burden of defining the environment variable OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
on the user.
Minimal example of the issue
import multiprocessing as mp
import requests
def worker(q):
print('worker: starting ...')
while True:
url = q.get()
if url is None:
print('worker: exiting ...')
break
print('worker: fetching', url)
response = requests.get(url)
print('worker: response:', response.status_code)
def master():
q = mp.Queue()
p = mp.Process(target=worker, args=(q,))
q.put('https://www.example.com/')
p.start()
print('master: started worker')
q.put('https://www.example.org/')
q.put('https://www.example.net/')
q.put(None)
print('master: sent data')
print('master: waiting for worker to exit')
p.join()
print('master: exiting ...')
master()
Here is the output with the error:
$ python3 foo.py
master: started worker
master: sent data
master: waiting for worker to exit
worker: starting ...
worker: fetching https://www.example.com/
objc[24250]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[24250]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
master: exiting ...
Resolutions
Here are a few independent things I have seen that resolve the issue, i.e., performing only one of these resolves the issue:
-
The issue seems to occur only on using the
requests
package. If we comment out these two lines inworker()
, it resolves the issue.# response = requests.get(url) # print('worker: response:', response.status_code)
-
The issue seems to occur only if
q.put('https://www.example.com/')
statement occurs before thep.start()
statement. If we move that statement aterp.start()
, that resolves the issue.p.start() print('master: started worker') q.put('https://www.example.com/')
-
Setting the environment variable
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
resolves the issue.OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python3 foo.py
Non-Resolution
Now, I do not want my users to set a variable name like this to be able to use my tool or API, so I was trying to figure if setting this environment variable within my program could resolve the issue. I found that that adding this to my code does not resolve the issue:
import os
os.environ['OBJC_DISABLE_INITIALIZE_FORK_SAFETY'] = 'YES'
# Does not resolve the issue!
Questions
-
Why exactly does this issue occur only under the given conditions, i.e.,
requests.get()
andq.put()
beforep.start()
? In other words, why does the issue disappear if one of these conditions are not met? -
If we were to expose something like the minimal example as an API function that another developer might call from their code, is there any clever way to resolve this issue in our code, so that the other developer does not have to set
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
in their shell before running their program that uses our function?
Of course, a possible solution is to redesign the solution such that we don't have to feed data into the queue before the worker process starts. That's definitely a possible solution. The scope of this question though is to discuss why this issue occurs only when we feed data into the queue before the worker process starts.
from Worker process crashes on requests.get() when data is put into input queue before the worker process starts
No comments:
Post a Comment