I'm building my tf dataset where there are multiple inputs (images and numerical/categorical data). The problem I am having is that multiple images correspond to the same row in the pd.Dataframe I have. I am doing regression.
So how, (even when shuffling all the inputs) do I ensure that each image gets mapped to the correct row?
Again, say I have 10 rows, and 100 images, with 10 images corresponding to a particular row. Now we shuffle the dataset, and we want to make sure that the shuffled images all correspond to their respective row.
I am using tf.data.Dataset to do this. I also have a directory structure such that the folder name corresponds to an element in the DataFrame, which is what I was thinking of using if I knew how to do the mapping
i.e. folder1 would be in the df with cols like dir_name, feature1, feature2, .... Naturally, the dir_names should not be passed as data into the model to fit on.
#images
path_ds = tf.data.Dataset.from_tensor_slices(paths)
image_ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=AUTOTUNE)
#numerical&categorical features. First remove the dirs
x_train_input = X_train[X_train.columns.difference(['dir_name'])]
x_train_input=np.expand_dims(x_train_input, axis=1)
text_ds = tf.data.Dataset.from_tensor_slices(x_train_input)
#labels, y_train's cols are: 'label' and 'dir_name'
label_ds = tf.data.Dataset.from_tensor_slices(
tf.cast(y_train['label'], tf.float32))
# test creation of dataset without prior shuffling.
xtrain_ = tf.data.Dataset.zip((image_ds, text_ds))
model_ds = tf.data.Dataset.zip((xtrain_, label_ds))
# Shuffling
BATCH_SIZE = 64
# Setting a shuffle buffer size as large as the dataset ensures that
# data is completely shuffled
ds = model_ds.shuffle(buffer_size=len(paths))
ds = ds.repeat()
ds = ds.batch(BATCH_SIZE)
# prefetch lets the dataset fetch batches in the background while the
# model is training
# ds = ds.prefetch(buffer_size=AUTOTUNE)
ds = ds.prefetch(buffer_size=BATCH_SIZE)
from Tensorflow 1.13.1 tf.data map multiple images with a single row together
No comments:
Post a Comment