I am working on a project where I have processes and stored documents of Single-page Medical Reports with Labelled Categories. The user will input one document and I have to classify which category it belongs to.
I have converted all documents to grayscaled image format and stored for comparison purposes.
I have a dataset of images having following data,
image_path: This column has a path to the imagehistogram_value: This column has a histogram of the image, calculated usingcv2.calcHistfunctionnp_avg: This column has an average value of all pixel of the image. Calculated usingnp.averagecategory: This column is a category of the image.
I am planning to use these two methods,
- Calculate
histogram_valueof the input image, find nearest 10 matching images- Calculate
np_avgof the input image, find nearest 10 matching images - Take intersect of both result set
- If more than one image found, do template matching to find the best fit.
- Calculate
I have very little knowledge in the Image Processing domain. Will the above mechanism is reliable for my purpose?
I check SO, found few questions on same but they have a very different problem and desired outcome. This question looks similar to my situation but it's very generic and I am not sure it will work in my scenario.
from Finding Similar Document
No comments:
Post a Comment