Hemant Vishwakarma: Determining exactly what is pickled during Python multiprocessing

Saturday, 28 August 2021

Determining exactly what is pickled during Python multiprocessing

As explained in the thread What is being pickled when I call multiprocessing.Process? there are circumstances where multiprocessing requires little to no data to be transferred via pickling. For example, on Unix systems, the interpreter uses fork() to create the processes, and objects which already exist when multiprocessing starts can be accessed by each process without pickling.

However, I'm trying to consider scenarios beyond "here's how it's supposed to work". For example, the code may have a bug and an object which is supposed to read-only is inadvertently modified, leading to its pickling to be transferred to other processes.

Is there some way to determine what, or at least how much, is pickled during multiprocessing? The information doesn't necessarily have to be in real-time, but it would be helpful if there was a way to get some statistics (e.g., number of objects pickled) which might give a hint as to why something is taking longer to run than intended because of unexpected pickling overhead.

I'm looking for a solution internal to the Python environment. Process tracing (e.g. Linux strace), network snooping, generalized IPC statistics, and similar solutions that might be used to count the number of bytes moving between processes aren't going to be specific enough to identify object pickling versus other types of communication.

from Determining exactly what is pickled during Python multiprocessing

Hemant Vishwakarma

Saturday, 28 August 2021

Determining exactly what is pickled during Python multiprocessing

No comments:

Post a Comment