Wednesday, 13 November 2019

Accuracy no longer improving after switching to Dataset

I recently trained a binary image classifier and ended up with a model which was around 97.8% accurate. I created this classifier by following a couple of official Tensorflow guides, namely:

I noticed while training (on a GTX 1080) that each epoch was taking around 30 seconds to run. Further reading revealed that a better way to load data into a Tensorflow training run is by using a Dataset. So I updated my code to load the images into a dataset and then have them read by the model.fit_generator method.

Now when I perform my training I find that my accuracy and loss metrics are static - even with the learning rate changing automatically over time. The output looks something like this:

loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000

Given that I'm training a binary classifier an accuracy of 50% is the same as guessing, so I'm wondering if there's a problem with the way I'm providing the images, or perhaps with the size of the dataset.

My image data is split like this:

training/
        true/  (366 images)
        false/ (354 images)

validation/
        true/  (175 images)
        false/ (885 images)

I was using ImageDataGenerator before with various mutations being performed, therefore increasing the overall dataset. Is my problem with the size of my dataset?

The application code I'm using is as follows:

import math

import tensorflow as tf
import os

from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import EarlyStopping

import helpers
import settings

AUTOTUNE = tf.data.experimental.AUTOTUNE

assert tf.test.is_built_with_cuda()
assert tf.test.is_gpu_available()

# Collect the list of training files and process their paths.
training_dataset_files = tf.data.Dataset.list_files(os.path.join(settings.TRAINING_DIRECTORY, '*', '*.png'))
training_dataset_labelled = training_dataset_files.map(helpers.process_path, num_parallel_calls=AUTOTUNE)
training_dataset = helpers.prepare_for_training(training_dataset_labelled)

# Collect the validation files.
validation_dataset_files = tf.data.Dataset.list_files(os.path.join(settings.VALIDATION_DIRECTORY, '*', '*.png'))
validation_dataset_labelled = validation_dataset_files.map(helpers.process_path, num_parallel_calls=AUTOTUNE)
validation_dataset = helpers.prepare_for_training(validation_dataset_labelled)

model = tf.keras.models.Sequential([
    # This is the first convolution
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(settings.TARGET_IMAGE_HEIGHT, settings.TARGET_IMAGE_WIDTH, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The second convolution
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The third convolution
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The fourth convolution
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The fifth convolution
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(),
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'),
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('false') and 1 for the other ('true')
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.summary()

model.compile(
    loss='binary_crossentropy',
    optimizer=RMSprop(lr=0.1),
    metrics=['acc']
)

callbacks = [
    # EarlyStopping(patience=4),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_acc',
        patience=2,
        verbose=1,
        factor=0.5,
        min_lr=0.00001
    ),
    tf.keras.callbacks.ModelCheckpoint(
        # Path where to save the model
        filepath=settings.CHECKPOINT_FILE,
        # The two parameters below mean that we will overwrite
        # the current checkpoint if and only if
        # the `val_loss` score has improved.
        save_best_only=True,
        monitor='val_loss',
        verbose=1
    ),
    tf.keras.callbacks.TensorBoard(
        log_dir=settings.LOG_DIRECTORY,
        histogram_freq=1
    )
]

training_dataset_length = tf.data.experimental.cardinality(training_dataset_files).numpy()
steps_per_epoch = math.ceil(training_dataset_length // settings.TRAINING_BATCH_SIZE)

validation_dataset_length = tf.data.experimental.cardinality(validation_dataset_files).numpy()
validation_steps = math.ceil(validation_dataset_length // settings.VALIDATION_BATCH_SIZE)

history = model.fit_generator(
    training_dataset,
    steps_per_epoch=steps_per_epoch,
    epochs=20000,
    verbose=1,
    validation_data=validation_dataset,
    validation_steps=validation_steps,
    callbacks=callbacks,
)

model.save(settings.FULL_MODEL_FILE)

A larger snippet of application output is as follows:

21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00207: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 247ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 208/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00208: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 248ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 209/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00209: val_loss did not improve from 7.71247
22/22 [==============================] - 6s 251ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 210/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00210: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 242ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 211/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00211: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 246ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 212/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00212: val_loss did not improve from 7.71247
22/22 [==============================] - 6s 252ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 213/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00213: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 242ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 214/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00214: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 241ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 215/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00215: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 247ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 216/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00216: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 248ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 217/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00217: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 249ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 218/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00218: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 244ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 219/20000
19/22 [========================>.....] - ETA: 0s - loss: 7.7125 - acc: 0.5000


from Accuracy no longer improving after switching to Dataset

No comments:

Post a Comment