Wednesday, 2 June 2021

remove corrupted xlsx files from a given directory

Update

There are some .xlsx files in an specific directory that are corrupted, because the windows message when trying to open the workbook is the following:

Excel cannot open the file 'filename.xlsx' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file.`

I am wondering if it is possible to detect these corrupted files and remove it from directory.

My trial:

############### path settlement and file names ##########
path_reportes = os.path.join(os.getcwd(), 'Reports', 'xlsx_folder')
file_names = os.listdir(path_reportes)
overall_df = dict()

############## concatenate all reports ##################

for file_name in file_names:

    data_file_path = os.path.join(path_reportes, file_name)
    """
    try open spreadsheets, save them and store them in a dictionary key
    except when the file is corrupted, if so, remove it from the 
    folder
    """
    try:
     # Start by opening the spreadsheet and selecting the main sheet
        workbook = openpyxl.load_workbook(filename=data_file_path)
        sheet = workbook.active
    
     # Save the spreadsheet
        workbook.save(filename=data_file_path)
        df_report_dict = pd.read_excel(data_file_path, sheet_name=None, engine='openpyxl')
    
        for key in df_report_dict:
            
            df_report_dict[key]['report_name'] = file_name
            
            try:
                  overall_df[key] = overall_df[key].append(df_report_dict[key], ignore_index=True)
            except:
                  overall_df[key] = df_report_dict[key]
                
                
    # when file corrupted then remove it from the folder             
    except BadZipFile:
                   os.remove(data_file_path)
            

Which throws the next error:

NameError: name 'BadZipFile' is not defined

Is it possible to detect corrupted files? How could I handle them?



from remove corrupted xlsx files from a given directory

No comments:

Post a Comment