Thursday, 25 February 2021

HDF5_USE_FILE_LOCKING issues in TensorFlow and Multiprocessing

I was working in TensorFlow and concurrent futures on Windows 10 using anaconda. I installed several packages and made it work. Below is the MWE:

import tensorflow as tf
from tensorflow import keras

import numpy as np
import concurrent.futures 
import time

def simple_model():
    model = keras.models.Sequential([
        keras.layers.Dense(units = 10, input_shape = [1]),
        keras.layers.Dense(units = 1, activation = 'sigmoid')
    ])
    model.compile(optimizer = 'sgd', loss = 'mean_squared_error')
    return model

def clone_model(model):
    model_clone = tf.keras.models.clone_model(model)
    model_clone.set_weights(model.get_weights())
    return model_clone

def work(model_path, seq):
    # model = clone_model(model)# model_list[model_id]
    # print(model)
    # import tensorflow as tf
    model = tf.keras.models.load_model(model_path)
    return model.predict(seq)

def workers(model, num_of_seq = 4):
    seqences = np.arange(0,num_of_seq*10).reshape(num_of_seq, -1)
    model_savepath = './simple_model.h5'
    model.save(model_savepath)
    path_list = [model_savepath for _ in range(num_of_seq)]

    with concurrent.futures.ProcessPoolExecutor(max_workers=None) as executor:        
        t0 = time.perf_counter()
        # model_list = [clone_model(model) for _ in range(num_of_seq)]
        index_list = np.arange(1, num_of_seq)
        # [clone_model(model) for _ in range(num_of_seq)]
        # print(model_list)
        future_to_samples = {executor.submit(work, path, seq): seq for path, seq in zip(path_list,seqences)}
    Seq_out = []
    for future in concurrent.futures.as_completed(future_to_samples):
        out = future.result()
        Seq_out.append(out)
    t1 = time.perf_counter()
    print(t1-t0)
    return np.reshape(Seq_out, (-1, )), t1-t0



if __name__ == '__main__':
    model = simple_model()
    num_of_seq = 400
    # model_list = [clone_model(model) for _ in range(4)]
    out = workers(model, num_of_seq=num_of_seq)
    print(out)

Above MWE, the aim is to predict the output of a saved model in parallel.

WORKER: Saves the model to the disk and saves the path in model_savepath It then calls four workers by sending the model path and the work function (data that needs to be predicted). Each one then clones a model from the path (using clone_model) and then uses it to predict. The output of the MWE (in windows) is:

2021-02-19 16:31:13.665341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]
15.456169300000003
(array([1., 1., 1., ..., 1., 1., 1.], dtype=float32), 15.456169300000003)

When I try to run the script in Ubuntu it keeps running with no output and until I force quit the process. And some time it gives the following error:

OSError: Unable to open file (unable to lock file, errno = 37, error message = 'No locks available')

What I have I tried:

  1. I used conda tf-gpu export > environment.yml to get all the installed files to a different Windows 10 and saw the same behaviour.
  2. Made a new environment in Windows 10 (where the code is working) with various different TensorFlow versions. All TF-2.x versions worked out.
  3. Tried the same code on Docker by pulling TensorFlow images. Same issues
  4. The issue is with locking HDF5 file, so tried: Can we disable h5py file locking for python file-like object?. It did not work

Similar issues have been reported in:



from HDF5_USE_FILE_LOCKING issues in TensorFlow and Multiprocessing

No comments:

Post a Comment