I was working in TensorFlow and concurrent futures on Windows 10 using anaconda. I installed several packages and made it work. Below is the MWE:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import concurrent.futures
import time
def simple_model():
model = keras.models.Sequential([
keras.layers.Dense(units = 10, input_shape = [1]),
keras.layers.Dense(units = 1, activation = 'sigmoid')
])
model.compile(optimizer = 'sgd', loss = 'mean_squared_error')
return model
def clone_model(model):
model_clone = tf.keras.models.clone_model(model)
model_clone.set_weights(model.get_weights())
return model_clone
def work(model_path, seq):
# model = clone_model(model)# model_list[model_id]
# print(model)
# import tensorflow as tf
model = tf.keras.models.load_model(model_path)
return model.predict(seq)
def workers(model, num_of_seq = 4):
seqences = np.arange(0,num_of_seq*10).reshape(num_of_seq, -1)
model_savepath = './simple_model.h5'
model.save(model_savepath)
path_list = [model_savepath for _ in range(num_of_seq)]
with concurrent.futures.ProcessPoolExecutor(max_workers=None) as executor:
t0 = time.perf_counter()
# model_list = [clone_model(model) for _ in range(num_of_seq)]
index_list = np.arange(1, num_of_seq)
# [clone_model(model) for _ in range(num_of_seq)]
# print(model_list)
future_to_samples = {executor.submit(work, path, seq): seq for path, seq in zip(path_list,seqences)}
Seq_out = []
for future in concurrent.futures.as_completed(future_to_samples):
out = future.result()
Seq_out.append(out)
t1 = time.perf_counter()
print(t1-t0)
return np.reshape(Seq_out, (-1, )), t1-t0
if __name__ == '__main__':
model = simple_model()
num_of_seq = 400
# model_list = [clone_model(model) for _ in range(4)]
out = workers(model, num_of_seq=num_of_seq)
print(out)
Above MWE, the aim is to predict the output of a saved model in parallel.
WORKER: Saves the model to the disk and saves the path in model_savepath It then calls four workers by sending the model path and the work function (data that needs to be predicted). Each one then clones a model from the path (using clone_model) and then uses it to predict. The output of the MWE (in windows) is:
2021-02-19 16:31:13.665341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]
15.456169300000003
(array([1., 1., 1., ..., 1., 1., 1.], dtype=float32), 15.456169300000003)
When I try to run the script in Ubuntu it keeps running with no output and until I force quit the process. And some time it gives the following error:
OSError: Unable to open file (unable to lock file, errno = 37, error message = 'No locks available')
What I have I tried:
- I used
conda tf-gpu export > environment.ymlto get all the installed files to a different Windows 10 and saw the same behaviour. - Made a new environment in Windows 10 (where the code is working) with various different TensorFlow versions. All TF-2.x versions worked out.
- Tried the same code on Docker by pulling TensorFlow images. Same issues
- The issue is with locking HDF5 file, so tried: Can we disable h5py file locking for python file-like object?. It did not work
Similar issues have been reported in:
- Opening already opened hdf5 file in write mode, using h5py https://github.com/h5py/h5py/issues/1066 https://github.com/h5py/h5py/issues/1101
from HDF5_USE_FILE_LOCKING issues in TensorFlow and Multiprocessing
No comments:
Post a Comment