Wednesday, 17 March 2021

Android: locating words on the screen. Google ML Kit bounding boxes are off a bit

I'm trying to find a certain words on the phone screen, and then display a bounding box around them if they are present. I follow these steps:

  1. Capture entire screen contents (with MediaProjection API).
  2. Pass this screenshot to a TextRecognizer object from the Google ML Kit
  3. Check the detected words, in case of match use the Rect returned by the ML Kit to draw on the screen.

It almost works, here is a screenshot of the detection finding and highlighting the word hello on the notepad app:

screenshot

As you can see the the semi transparent yellow boxed are a not quite on the hellos.

Here are the relevant code samples. Passing the screenshot bitmap to the ML Kit:

InputImage image = InputImage.fromBitmap(screenshotBitmap, 0);
//I checked: image, screen, and overlay view dimensions are exactly the same.
TextRecognizer recognizer = TextRecognition.getClient();
recognizer.process(image)
          .addOnSuccessListener(this::processText);

The processText method which gets the recognized words:

 for (Text.Element element : getElements()) {
      String elementText = element.getText(); 
      Rect bounds = element.getBoundingBox(); //Getting the bounding box
      if (elementText.equalsIgnoreCase("hello")) { //hello is hardcoded for now
          addHighlightCard(bounds.left, bounds.top, bounds.width(), bounds.height());
      }
 }

And finally, the addHighlightCard, which creates and positions the views you see on the screenshot. It uses a fullscreen overlay, with a RelativeLayout, because this layout allows me to specify the exact location and width of child views.

public void addHighlightCard(int x, int y, int width, int height) {
    View highlightCard = inflater.inflate(R.layout.highlight_card, overlayRoot, false);
    RelativeLayout.LayoutParams params = new RelativeLayout.LayoutParams(width, height);
    params.leftMargin = x;
    params.topMargin = y;
    highlightCard.setLayoutParams(params);
    overlayRoot.addView(highlightCard, params);
}

As you can see there is no scaling going on whatsoever, I capture the whole screen, and I use a layout which fills the whole screen (even the toolbar). Then, I though the coordinates returned by the ML Kit should be directly usable to draw to the screen. But clearly I'm wrong, it seems the image is getting scaled down somewhere, but I can't figure out where.



from Android: locating words on the screen. Google ML Kit bounding boxes are off a bit

No comments:

Post a Comment