Hemant Vishwakarma: How to improve accuracy of Tensorflow camera demo on iOS for retrained graph

I have an Android app that was modeled after the Tensorflow Android demo for classifying images,

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android

The original app uses a tensorflow graph (.pb) file to classify a generic set of images from Inception v3 (I think)

I then trained my own graph for my own images following the instruction in Tensorflow for Poets blog,

https://petewarden.com/2016/02/28/tensorflow-for-poets/

and this worked in the Android app very well, after changing the settings in,

ClassifierActivity

private static final int INPUT_SIZE = 299;
private static final int IMAGE_MEAN = 128;
private static final float IMAGE_STD = 128.0f;
private static final String INPUT_NAME = "Mul";
private static final String OUTPUT_NAME = "final_result";
private static final String MODEL_FILE = "file:///android_asset/optimized_graph.pb";
private static final String LABEL_FILE =  "file:///android_asset/retrained_labels.txt";

To port the app to iOS, I then used the iOS camera demo, https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/ios/camera

and used the same graph file and changed the settings in,

CameraExampleViewController.mm

// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = false;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 299;
const int wanted_input_height = 299;
const int wanted_input_channels = 3;
const float input_mean = 128f;
const float input_std = 128.0f;
const std::string input_layer_name = "Mul";
const std::string output_layer_name = "final_result";

After this the app is working on iOS, however...

The app on Android performs much better than iOS in detecting classified images. If I fill the camera's view port with the image, both perform similar. But normally the image to detect is only part of the camera view port, on Android this doesn't seem to impact much, but on iOS it impacts a lot, so iOS cannot classify the image.

My guess is that Android is cropping if camera view port to the central 299x299 area, where as iOS is scaling its camera view port to the central 299x299 area.

Can anyone confirm this? and does anyone know how to fix the iOS demo to better detect focused images? (make it crop)

In the demo Android class,

ClassifierActivity.onPreviewSizeChosen()

rgbFrameBitmap = Bitmap.createBitmap(previewWidth, previewHeight, Config.ARGB_8888);
    croppedBitmap = Bitmap.createBitmap(INPUT_SIZE, INPUT_SIZE, Config.ARGB_8888);

frameToCropTransform =
        ImageUtils.getTransformationMatrix(
            previewWidth, previewHeight,
            INPUT_SIZE, INPUT_SIZE,
            sensorOrientation, MAINTAIN_ASPECT);

cropToFrameTransform = new Matrix();
frameToCropTransform.invert(cropToFrameTransform);

and on iOS is has,

CameraExampleViewController.runCNNOnFrame()

const int sourceRowBytes = (int)CVPixelBufferGetBytesPerRow(pixelBuffer);
  const int image_width = (int)CVPixelBufferGetWidth(pixelBuffer);
  const int fullHeight = (int)CVPixelBufferGetHeight(pixelBuffer);

  CVPixelBufferLockFlags unlockFlags = kNilOptions;
  CVPixelBufferLockBaseAddress(pixelBuffer, unlockFlags);

  unsigned char *sourceBaseAddr =
      (unsigned char *)(CVPixelBufferGetBaseAddress(pixelBuffer));
  int image_height;
  unsigned char *sourceStartAddr;
  if (fullHeight <= image_width) {
    image_height = fullHeight;
    sourceStartAddr = sourceBaseAddr;
  } else {
    image_height = image_width;
    const int marginY = ((fullHeight - image_width) / 2);
    sourceStartAddr = (sourceBaseAddr + (marginY * sourceRowBytes));
  }
  const int image_channels = 4;

  assert(image_channels >= wanted_input_channels);
  tensorflow::Tensor image_tensor(
      tensorflow::DT_FLOAT,
      tensorflow::TensorShape(
          {1, wanted_input_height, wanted_input_width, wanted_input_channels}));
  auto image_tensor_mapped = image_tensor.tensor<float, 4>();
  tensorflow::uint8 *in = sourceStartAddr;
  float *out = image_tensor_mapped.data();
  for (int y = 0; y < wanted_input_height; ++y) {
    float *out_row = out + (y * wanted_input_width * wanted_input_channels);
    for (int x = 0; x < wanted_input_width; ++x) {
      const int in_x = (y * image_width) / wanted_input_width;
      const int in_y = (x * image_height) / wanted_input_height;
      tensorflow::uint8 *in_pixel =
          in + (in_y * image_width * image_channels) + (in_x * image_channels);
      float *out_pixel = out_row + (x * wanted_input_channels);
      for (int c = 0; c < wanted_input_channels; ++c) {
        out_pixel[c] = (in_pixel[c] - input_mean) / input_std;
      }
    }
  }

  CVPixelBufferUnlockBaseAddress(pixelBuffer, unlockFlags);

I think the issue is here,

tensorflow::uint8 *in_pixel =
          in + (in_y * image_width * image_channels) + (in_x * image_channels);
      float *out_pixel = out_row + (x * wanted_input_channels);

My understanding is this is just scaling to the 299 size by pick every xth pixel instead of scaling the original image to the 299 size. So this leads to poor scaling and poor image recognition.

The solution is to first scale to pixelBuffer to size 299. I tried this,

UIImage *uiImage = [self uiImageFromPixelBuffer: pixelBuffer];
float scaleFactor = (float)wanted_input_height / (float)fullHeight;
float newWidth = image_width * scaleFactor;
NSLog(@"width: %d, height: %d, scale: %f, height: %f", image_width, fullHeight, scaleFactor, newWidth);
CGSize size = CGSizeMake(wanted_input_width, wanted_input_height);
UIGraphicsBeginImageContext(size);
[uiImage drawInRect:CGRectMake(0, 0, newWidth, size.height)];
UIImage *destImage = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();
pixelBuffer = [self pixelBufferFromCGImage: destImage.CGImage];

and to convert image to pixle buffer,

- (CVPixelBufferRef) pixelBufferFromCGImage: (CGImageRef) image
{
    NSDictionary *options = @{
                              (NSString*)kCVPixelBufferCGImageCompatibilityKey : @YES,
                              (NSString*)kCVPixelBufferCGBitmapContextCompatibilityKey : @YES,
                              };

    CVPixelBufferRef pxbuffer = NULL;
    CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault, CGImageGetWidth(image),
                                          CGImageGetHeight(image), kCVPixelFormatType_32ARGB, (__bridge CFDictionaryRef) options,
                                          &pxbuffer);
    if (status!=kCVReturnSuccess) {
        NSLog(@"Operation failed");
    }
    NSParameterAssert(status == kCVReturnSuccess && pxbuffer != NULL);

    CVPixelBufferLockBaseAddress(pxbuffer, 0);
    void *pxdata = CVPixelBufferGetBaseAddress(pxbuffer);

    CGColorSpaceRef rgbColorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef context = CGBitmapContextCreate(pxdata, CGImageGetWidth(image),
                                                 CGImageGetHeight(image), 8, 4*CGImageGetWidth(image), rgbColorSpace,
                                                 kCGImageAlphaNoneSkipFirst);
    NSParameterAssert(context);

    CGContextConcatCTM(context, CGAffineTransformMakeRotation(0));
    CGAffineTransform flipVertical = CGAffineTransformMake( 1, 0, 0, -1, 0, CGImageGetHeight(image) );
    CGContextConcatCTM(context, flipVertical);
    CGAffineTransform flipHorizontal = CGAffineTransformMake( -1.0, 0.0, 0.0, 1.0, CGImageGetWidth(image), 0.0 );
    CGContextConcatCTM(context, flipHorizontal);

    CGContextDrawImage(context, CGRectMake(0, 0, CGImageGetWidth(image),
                                           CGImageGetHeight(image)), image);
    CGColorSpaceRelease(rgbColorSpace);
    CGContextRelease(context);

    CVPixelBufferUnlockBaseAddress(pxbuffer, 0);
    return pxbuffer;
}

- (UIImage*) uiImageFromPixelBuffer: (CVPixelBufferRef) pixelBuffer {
    CIImage *ciImage = [CIImage imageWithCVPixelBuffer: pixelBuffer];

    CIContext *temporaryContext = [CIContext contextWithOptions:nil];
    CGImageRef videoImage = [temporaryContext
                             createCGImage:ciImage
                             fromRect:CGRectMake(0, 0,
                                                 CVPixelBufferGetWidth(pixelBuffer),
                                                 CVPixelBufferGetHeight(pixelBuffer))];

    UIImage *uiImage = [UIImage imageWithCGImage:videoImage];
    CGImageRelease(videoImage);
    return uiImage;
}

Not sure if this is the best way to resize, but this worked. But it seemed to make image classification even worse, not better...

Any ideas, or issues with the image conversion/resize?

from How to improve accuracy of Tensorflow camera demo on iOS for retrained graph

Hemant Vishwakarma

Friday, 20 July 2018

How to improve accuracy of Tensorflow camera demo on iOS for retrained graph

No comments:

Post a Comment