Monday, 23 November 2020

How does joblib.Parallel deal with global variables?

My code looks something like this:

from joblib import Parallel, delayed

# prediction model - 10s of megabytes on disk
LARGE_MODEL = load_model('path/to/model')

file_paths = glob('path/to/files/*')

def do_thing(file_path):
  pred = LARGE_MODEL.predict(load_image(file_path))
  return pred

Parallel(n_jobs=2)(delayed(do_thing)(fp) for fp in file_paths)

My question is whether LARGE_MODEL will be pickled/unpickled with each iteration of the loop. And if so, how can I make sure each worker caches it instead (if that's possible)?



from How does joblib.Parallel deal with global variables?

No comments:

Post a Comment