Monday, 22 February 2021

Deep Learning solution for digit recognition on natural scene

I am working on a problem, where I want to automatically read the number on images as follows:

enter image description here

enter image description here

As can be seen, the images are quite challenging! Not only are these not connected lines in all cases, but also the contrast differs a lot. My first attempt was using pytesseract after some preprocessing. I also created a StackOverflow post here [https://ift.tt/3sb2lJU].

While this approach works fine on an individual image, it is not universal, as it requires too much manual information for the preprocessing. The best solution I have so far, is to iterate over some hyperparameters such as threshold value, filter size of erosion/dilation, etc. However, this is computationally expensive!

Therefore I came to believe, that the solution I am looking for must be deep-learning based. I have two ideas here:

  • Using a pre-trained network on a similar task
  • Splitting the input images into separate digits and train / finetune a network myself in an MNIST fashion

Regarding the first approach, I have not found something good yet. Does anyone have an idea for that?

Regarding the second approach, I would need a method first to automatically generate images of the separate digits. I guess this should also be deep-learning-based. Afterward, I could maybe achieve some good results with some data augmentation.

Does anyone have ideas? :)



from Deep Learning solution for digit recognition on natural scene

No comments:

Post a Comment