Friday, 2 September 2022

Extracting data from tables without any grid lines and border from scanned image of document

Extracting table data from digital PDFs have been simple using camelot and tabula. However, the solution doesn't work with scanned images of the document pages specifically when the table doesn't have borders and inner grids. I have been trying to generate vertical and horizontal lines using OpenCV. However, since the scanned images will have slight rotation angles, it is difficult to proceed with the approach.

How can we utilize OpenCV to generate grids (horizontal and vertical lines) and borders for the scanned document page which contains table data (along with paragraphs of text)? If this is feasible, how to nullify the rotation angle of the scanned image?



from Extracting data from tables without any grid lines and border from scanned image of document

No comments:

Post a Comment