I am using Perceptual hashing technique to find near-duplicate and exact-duplicate images. The code is working perfectly for finding exact-duplicate images. However, finding near-duplicate and slightly modified images seems to be difficult. As the difference score between their hashing is generally similar to the hashing difference of completely different random images.
To tackle this, I tried to reduce the pixelation of the near-duplicate images to 50x50 pixel and make them black/white, but I still don't have what I need (small difference score).
This is a sample of a near duplicate image pair:
Image 1 (a1.jpg):
Image 2 (b1.jpg):
The difference between the hashing score of these images is : 24
When pixeled (50x50 pixels), they look like this:
rs_a1.jpg
rs_b1.jpg
The hashing difference score of the pixeld images is even bigger! : 26
Below two more examples of near duplicate image pairs as requested by @ann zen:
Pair 1
Pair 2
The code I use to reduce the image size is this :
from PIL import Image
with Image.open(image_path) as image:
reduced_image = image.resize((50, 50)).convert('RGB').convert("1")
And the code for comparing two image hashing:
from PIL import Image
import imagehash
with Image.open(image1_path) as img1:
hashing1 = imagehash.phash(img1)
with Image.open(image2_path) as img2:
hashing2 = imagehash.phash(img2)
print('difference : ', hashing1-hashing2)
from Find near duplicate and faked images
No comments:
Post a Comment