Monday 15 March 2021

How to OCR low quality code picture with pytesseract

I have a set of pictures (sample) of the same formatted code, I've tried every thing but nothing works well. I tried blurring, hsv, threshing, etc. can you help me out?

import pytesseract
import cv2
imgr = cv2.imread("a.png")
img = cv2.resize(imgr, (int(imgr.shape[1] * 3), int(imgr.shape[0] * 3)), interpolation=cv2.INTER_AREA)
img = cv2.blur(img, (7, 7))
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h, s, v = cv2.split(hsv)
cv2.imshow("", v)
cv2.waitKey(0)
p = pytesseract.image_to_string(v)
print(p)
thresh = cv2.threshold(v, 170, 255, cv2.THRESH_BINARY)[1]
cv2.imshow("", thresh)
cv2.waitKey(0)
print(pytesseract.image_to_string(thresh))
ation


from How to OCR low quality code picture with pytesseract

No comments:

Post a Comment