Monday 2 November 2020

Multiprocessing and multithreading in Python

I have a python program which 1) Reads from a very large file from Disk(~95% time) and then 2) Process and Provide a relatively small output (~5% time). This Program is to be run on TeraBytes of files .

Now i am looking to Optimize this Program by utilizing Multi Processing and Multi Threading . The platform I am running is a Virtual Machine with 4 Processors on a virtual Machine .

I plan to have a scheduler Process which will execute 4 Processes (same as processors) and then Each Process should have some threads as most part is I/O . Each Thread will process 1 file & will report result to the Main Thread which in turn will report it back to scheduler Process via IPC . Scheduler can queue these and eventually write them to disk in ordered manner

So wondering How does one decide number of Processes and Threads to create for such scenario ? Is there a Mathematical way to figure out whats the best mix .

Thankyou



from Multiprocessing and multithreading in Python

No comments:

Post a Comment