Monday 31 August 2020

Sorting contours based on precedence in Python, OpenCV

I am trying to sort contours based on their arrivals, left-to-right and top-to-bottom just like how you write anything. From, top and left and then whichever comes accordingly.

This is what and how I have achieved up to now:

def get_contour_precedence(contour, cols):
    tolerance_factor = 61
    origin = cv2.boundingRect(contour)
    return ((origin[1] // tolerance_factor) * tolerance_factor) * cols + origin[0]


image = cv2.imread("C:/Users/XXXX/PycharmProjects/OCR/raw_dataset/23.png", 0)

ret, thresh1 = cv2.threshold(image, 130, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

contours, h = cv2.findContours(thresh1.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# perform edge detection, find contours in the edge map, and sort the
# resulting contours from left-to-right
contours.sort(key=lambda x: get_contour_precedence(x, thresh1.shape[1]))

# initialize the list of contour bounding boxes and associated
# characters that we'll be OCR'ing
chars = []
inc = 0
# loop over the contours
for c in contours:
    inc += 1

    # compute the bounding box of the contour
    (x, y, w, h) = cv2.boundingRect(c)

    label = str(inc)
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
    cv2.putText(image, label, (x - 2, y - 2),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    print('x=', x)
    print('y=', y)
    print('x+w=', x + w)
    print('y+h=', y + h)
    crop_img = image[y + 2:y + h - 1, x + 2:x + w - 1]
    name = os.path.join("bounding boxes", 'Image_%d.png' % (
        inc))
    cv2.imshow("cropped", crop_img)
    print(name)
    crop_img = Image.fromarray(crop_img)
    crop_img.save(name)
    cv2.waitKey(0)

cv2.imshow('mat', image)
cv2.waitKey(0)

Input Image :

Input image

Output Image 1:

Output Image 1(WOrking)

Input Image 2 :

Image Input 2

Output for Image 2:

Output Image 2

Input Image 3:

Input Image 3

Output Image 3:

Output Image 3

As you can see the 1,2,3,4 is not what I was expecting it to be each image, as displayed in the Image Number 3.

How do I adjust this to make it work or even write a custom function?

NOTE: I have multiple images of the same input image provided in my question. The content is the same but they have variations in the text so the tolerance factor is not working for each one of them. Manually adjusting it would not be a good idea.



from Sorting contours based on precedence in Python, OpenCV

No comments:

Post a Comment