Monday, 6 May 2019

Tensorflow 1.13.1 tf.data map multiple images with a single row together

I'm building my tf dataset where there are multiple inputs (images and numerical/categorical data). The problem I am having is that multiple images correspond to the same row in the pd.Dataframe I have. I am doing regression.

So how, (even when shuffling all the inputs) do I ensure that each image gets mapped to the correct row?

Again, say I have 10 rows, and 100 images, with 10 images corresponding to a particular row. Now we shuffle the dataset, and we want to make sure that the shuffled images all correspond to their respective row.

I am using tf.data.Dataset to do this. I also have a directory structure such that the folder name corresponds to an element in the DataFrame, which is what I was thinking of using if I knew how to do the mapping

i.e. folder1 would be in the df with cols like dir_name, feature1, feature2, .... Naturally, the dir_names should not be passed as data into the model to fit on.

#images
path_ds = tf.data.Dataset.from_tensor_slices(paths)
image_ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=AUTOTUNE)

#numerical&categorical features. First remove the dirs
x_train_input = X_train[X_train.columns.difference(['dir_name'])]
x_train_input=np.expand_dims(x_train_input, axis=1)
text_ds = tf.data.Dataset.from_tensor_slices(x_train_input)

#labels, y_train's cols are: 'label' and 'dir_name'
label_ds = tf.data.Dataset.from_tensor_slices(
    tf.cast(y_train['label'], tf.float32))

# test creation of dataset without prior shuffling.
xtrain_ = tf.data.Dataset.zip((image_ds, text_ds))
model_ds = tf.data.Dataset.zip((xtrain_, label_ds))


# Shuffling
BATCH_SIZE = 64

# Setting a shuffle buffer size as large as the dataset ensures that
# data is completely shuffled
ds = model_ds.shuffle(buffer_size=len(paths))
ds = ds.repeat()
ds = ds.batch(BATCH_SIZE)
# prefetch lets the dataset fetch batches in the background while the
# model is training
# ds = ds.prefetch(buffer_size=AUTOTUNE)
ds = ds.prefetch(buffer_size=BATCH_SIZE)




from Tensorflow 1.13.1 tf.data map multiple images with a single row together

No comments:

Post a Comment