Friday, 16 April 2021

Workaround for multiprocessing with local functions in Python?

Multiprocessing with locally defined functions?

I am porting over a library for a client who is very picky about external dependencies.

The majority of the multiprocessing in this library is supported by the pathos ProcessPool module. The main reason being that it can very easily deal with locally defined functions.

I'm trying to get some of this functionality back without forcing this dependence (or having to rewrite large chunks of the library). I understand that the following code works because the function is defined at the top level:

import multiprocessing as mp


def f(x):
    return x * x


def main():
    with mp.Pool(5) as p:
        print(p.map(f, [i for i in range(10)]))


if __name__ == "__main__":
    main()

The following code (which is what I need to get working) fails as the function is only defined in the local scope:

import multiprocessing as mp


def main():
    def f(x):
        return x * x

    with mp.Pool(5) as p:
        print(p.map(f, [i for i in range(10)]))


if __name__ == "__main__":
    main()

Anyone know of a good workaround for this specific use case which doesn't require external dependancies? Thanks for reading.

Updates:

  • There is a work around that uses fork but this is unsafe for Mac and Windows (thanks @Monica and @user2357112).
  • @Blop provided an excellent suggestion that will work for many. In my case (not the toy example above) the objects in my generator are unmarshallable.
  • @amsh provided a workaround which seems to work for any function + generator. While a great option, the downside is it that it requires the function be defined at the global scope.


from Workaround for multiprocessing with local functions in Python?

No comments:

Post a Comment