Sunday, 11 April 2021

Why does pandas recognize dictionary values as a list of column names when used for indexing, but not for assignment?

Earlier on in my code I rename some columns with a dictionary:

cols_dict = {
     'Long_column_Name': 'first_column',
     'Other_Long_Column_Name': 'second_column',
     'AnotherLongColName': 'third_column'
}
for key, val in cols_dict.items():
    df.rename(columns={key: val}, inplace=True)

(I know the loop isn't necessary here — in my actual code I'm having to search the columns of a dataframe in a list of dataframes and get a substring match for the dictionary key.)

Later on I do some clean up with applymap(), index with the dictionary values, and it works fine

pibs[cols_dict.values()].applymap(
    lambda x: np.nan if ':' in str(x) else x
)

but when I try to assign the slice back to itself, I get a key error (full error message here).

pibs[cols_dict.values()] = pibs[cols_dict.values()].applymap(
    lambda x: np.nan if ':' in str(x) else x
)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: dict_values(['first_column', 'second_column', 'third_column'])

The code runs fine if I convert the dictionary values to a list

pibs[list(cols_dict.values())] = ...

so I guess I'm just wondering why I'm able to slice with dictionary values and run applymap() on it, but I'm not able to slice with dictionary values when I turn around and try to assign the result back to the dataframe.

Put simply: why does pandas recognize cols_dict.values() as a list of column names when it's used for indexing, but not when it's used for indexing for assignment?



from Why does pandas recognize dictionary values as a list of column names when used for indexing, but not for assignment?

No comments:

Post a Comment