I would like to learn image segmentation in TensorFlow. I am aware that this topic has been extensively discussed in the community, brining up great stackoverflow posts like this and this but I still lack to understand if I am using the correct loss function in my case.
I have two images, ground_truth and prediction and each have shape (120,160). The ground_truth image pixels can only be of value 0.0 or 1.0.
The prediction image is the output of a decoder and the last two layers of it are a tf.layers.conv2d_transpose and tf.layers.conv2d like so:
# transforms (?,120,160,30) -> (?,120,160,15)
outputs = tf.layers.conv2d_transpose(outputs, filters=15, kernel_size=1, strides=1, padding='same')
# ReLU
outputs = activation(outputs)
# transforms (?,120,160,15) -> (?,120,160,1)
outputs = tf.layers.conv2d(outputs, filters=1, kernel_size=1, strides=1, padding='same')
The last layer does not carry an activation function and thus it's output is unbounded.
Now as for the loss function: this is the part where I am confused and would like to call for support. Before I use tf.nn.sigmoid_cross_entropy_with_logits I flatten logits and labels from (?,120,160) into (?,120*160) as follows:
logits = tf.reshape(predicted, [-1, predicted.get_shape()[1] * predicted.get_shape()[2]])
labels = tf.reshape(ground_truth, [-1, ground_truth.get_shape()[1] * ground_truth.get_shape()[2]])
loss = 0.5 * tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels,logits=logits))
This setup will not converge. If I exchange my loss function with a simple tf.losses.mean_squared_error function, my training converges nicely. What am I doing wrong? I am aware I could use a softmax_cross_entropy_with_logits (or it's sparse version?) for my 2 classes {0.0,1.0} by reshaping labels into (?*120*160,2) but I don't see how this would affect the result at all.
from TensorFlow image segmentation: MSE loss converges but sigmoid cross entropy loss does not
No comments:
Post a Comment