Saturday, 12 June 2021

Writing Pandas DataFrame to Excel: How to auto-adjust column widths

I am trying to write a series of pandas DataFrames to an Excel worksheet such that:

  1. The existing contents of the worksheet are not overwritten or erased, and
  2. the Excel column widths are adjusted to fit the lengths of the column entries (so that I don't have to manually do this in Excel).

For 1), I have found an excellent solution in the form of a helper function written by @MaxU: How to write to an existing excel file without overwriting data (using pandas)?. For 2) I found what looked like a good solution here. But when I try to put these solutions together, the column widths don't change at all. Here's my full code:

import pandas as pd
import os
from openpyxl import load_workbook

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('d:/temp/test.xlsx', df)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                           index=False, startrow=25)

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    # Excel file doesn't exist - saving and exiting
    if not os.path.isfile(filename):
        df.to_excel(
            filename,
            sheet_name=sheet_name, 
            startrow=startrow if startrow is not None else 0, 
            **to_excel_kwargs)
        return
    
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

    # try to open an existing workbook
    writer.book = load_workbook(filename)
    
    # get the last row in the existing Excel sheet
    # if it was not specified explicitly
    if startrow is None and sheet_name in writer.book.sheetnames:
        startrow = writer.book[sheet_name].max_row

    # truncate sheet
    if truncate_sheet and sheet_name in writer.book.sheetnames:
        # index of [sheet_name] sheet
        idx = writer.book.sheetnames.index(sheet_name)
        # remove [sheet_name]
        writer.book.remove(writer.book.worksheets[idx])
        # create an empty sheet [sheet_name] using old index
        writer.book.create_sheet(sheet_name, idx)
    
    # copy existing sheets
    writer.sheets = {ws.title:ws for ws in writer.book.worksheets}

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

   
  """
   Now attempt to adjust the column widths as necessary so that all the cell contents are visible 
   in Excel. The code below is taken from https://towardsdatascience.com/how-to-auto-adjust-the-width-of-excel-columns-with-pandas-excelwriter-60cee36e175e.
 """
    for column in df:
      column_width = max(df[column].astype(str).map(len).max(), len(column))
      col_idx = df.columns.get_loc(column)
      writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

    writer.save()

Now I tried testing the function:

df = pd.DataFrame({'A_Very_Long_Column_Name': [10, 20, 30, 20, 15, 30, 45]})
append_df_to_excel("C:/Users/Leonidas/Documents/test.xlsx", df, "Sheet1")

A new Excel workbook named test.xlsx is created along with a sheet named Sheet1, and the contents of df are written to Sheet1, but the column widths are completely unaffected: enter image description here

And strangely, when I try to execute the function a second time (without changing the arguments), I get an error:

runcell(2, 'C:/Users/Leonidas/Documents/write_to_excel2.py')
Traceback (most recent call last):

  File "C:\Users\Leonidas\Documents\write_to_excel2.py", line 125, in <module>
    append_df_to_excel("C:/Users/Leonidas/Documents/test.xlsx", df,

  File "C:\Users\Leonidas\Documents\write_to_excel2.py", line 100, in append_df_to_excel
    writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

AttributeError: 'Worksheet' object has no attribute 'set_column'

I'm pretty confused at this point...Any suggestions for how to fix the code would be greatly appreciated.



from Writing Pandas DataFrame to Excel: How to auto-adjust column widths

No comments:

Post a Comment