I have two dense matrices with the sizes (2500, 208) and (208, 2500). I want to calculate their product. It works fine and fast when it is a single process but when it is in a multiprocessing block, the processes stuck in there for hours. I do sparse matrices multiplication with even larger sizes but I have no problem. My code looks like this:
with Pool(processes=agents) as pool:
result = pool.starmap(run_func, args)
def run_func(args):
#Do stuff. Including large sparse matrices multiplication.
C = np.matmul(A,B) # or A.dot(B) or even using BLASS library directly dgemm(1, A, B)
#Never go after the line above!
Note that when the function run_func is executed in a single process, then it works fine. When I do multiprocessing on my local machine, it works fine. When I go for a multiprocessing on HPC, it stucks. I allocate my resources like this:
srun -v --nodes=1 --time 7-0:0 --cpus-per-task=2 --nodes=1 --mem-per-cpu=20G python3 -u run.py 2
Where the last parameter is the number of agents in the code above. Here is the LAPACK library details supported on the HPC (obtained from numpy):
libraries = ['mkl_rt', 'pthread']
library_dirs = ['**/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['**/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['**lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['**/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['**/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['**/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['**/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['**/include']
Compared to my local machine, all python packages and python version on HPC are the same. Any leads on what is going on?
from numpy matrix mult does not work when it is parallized on HPC
No comments:
Post a Comment