Friday, 14 October 2022

How to copy only non-duplicate files whilst maintaining folder structure?

I am trying to find duplicates between two folders and copy only unique image files to the 'dest' folder. I can copy all the non-dupes using the code below, however it doesn't maintain the source directory structure. I think OS.walk returns 3 tuples, but they aren't linked so not sure how to re-construct the sub dir?

Example:

import shutil, os
from difPy import dif
source = input('Input source folder:')
dest = input('Input backup \ destination folder:')

ext = ('.jpg','.jpeg','.gif','.JPG','.JPEG','.GIF')

search = dif(source, dest)
result = search.result
result


dupes = []
srcfiles = []
filecount = []
failed = []
removed = []

for i in result.values(): 
        dupes.append(i['location'])

for dirpath, subdirs, files in os.walk(source):
    for x in files:
        if x.endswith(ext):
            srcfiles.append(os.path.join(dirpath, x))

for f in srcfiles:
                if f not in dupes:
                        shutil.copy(f, dest)
                        print('File copied successfully - '+f)
                        filecount.append(f)
                else:
                        print('File not copied successfully !!!! - '+f)
                        failed.append(f)

I have also tried using the shutil.copytree function with an ignore list, however it requires a new folder and can't get the ignore list function to work

shutil.copytree example:

for i in result.values(): 
        df = []
        df.append(i['filename'])

def ignorelist(source, df):
        return [f for f in df if os.path.isfile(os.path.join(source, f))]

shutil.copytree(source, destnew, ignore=ignorelist)


from How to copy only non-duplicate files whilst maintaining folder structure?

No comments:

Post a Comment