I am training with tensorflow2.0 with multiple GPU. It got the following errors. But if I use only one GPU it ran without any error. My tensorflow version is tensorflow-gpu-2.0.0:
tensorflow.python.framework.errors_impl.CancelledError: 4 root error(s) found.
(0) Cancelled: Operation was cancelled
[[]]
(1) Out of range: End of sequence
[[]]
(2) Out of range: End of sequence
[[]]
[[metrics/accuracy/div_no_nan/ReadVariableOp_6/_154]]
(3) Out of range: End of sequence
[[]]
0 successful operations.
1 derived errors ignored. [Op:__inference_distributed_function_83325]
Function call stack:
distributed_function -> distributed_function -> distributed_function -> distributed_function
This is my code, you can try with environment variable: CUDA_VISIBLE_DEVICES=0 or CUDA_VISIBLE_DEVICES=0,1. That will get different result:
import tensorflow as tf
import tensorflow_datasets as tfds
data_name = 'uc_merced'
dataset = tfds.load(data_name)
train_data, test_data = dataset['train'], dataset['train']
def parse(img_dict):
img = tf.image.resize_with_pad(img_dict['image'], 256, 256)
label = img_dict['label']
return img, label
train_data = train_data.map(parse)
train_data = train_data.batch(96)
test_data = test_data.map(parse)
test_data = test_data.batch(96)
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = tf.keras.applications.ResNet50(weights=None, classes=21, input_shape=(256, 256, 3))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_data, epochs=50, verbose=2, validation_data=test_data)
model.save('model/resnet_{}.h5'.format(data_name))
from train with muliple gpu with tensorflow2.0 get error: Out of range: End of sequence
No comments:
Post a Comment