Tuesday, 8 November 2022

Does column slice of a pandas dataframe with columns of different data types create a view or a copy?

I have some dataframes as follows:

df = pd.DataFrame([[1,2.0],[3,4.0]], index = ['row1','row2'], 
        columns = ['a','b'])
df2 = df.iloc[:, :]
df3 = df.iloc[:1, :]
df4 = df.iloc[:, :1]

Column a is int while column b is float.

Question: are df2, df3, df4 view or copy

test 1:

print(df._is_view, df._is_copy)
print(df2._is_view, df2._is_copy)
print(df3._is_view, df3._is_copy)
print(df4._is_view, df4._is_copy)
False None
False None
False <weakref at 0x7fed1113de90; to 'DataFrame' at 0x7fed11aa80a0>
True <weakref at 0x7fed114d65c0; to 'DataFrame' at 0x7fed11aa9ab0>

From this, it says df2, df3 are not a view. But df4 is.

Why?

test 2:

df2.loc['row1', 'b'] = 100.0
print(df1)
df3.loc['row1', 'a'] = 1000.0
print(df1)
df4.loc['row1', 'a'] = 10000.0
print(df1)

       a    b
row1  10  2.0
row2   3  4.0
        a    b
row1  100  2.0
row2    3  4.0
        a    b
row1  100  2.0
row2    3  4.0

/tmp/ipykernel_2006744/1832530048.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df4.loc['row1', 'a'] = 1000

From this, it can be seen that df's value is updated when df2 or df3 is updated. So df2 and df3 should be a view.

Updating df4 does not propagate to df, so df4 seems to be a copy.

How come the results are contradicting to _is_view

Question 2:

The SettingWithCopyWarning when setting df4 says a copy of a slice. What is this refering to?

Is "a slice" refering to df4? Then what is the "a copy of a slice" provided I am using .loc?



from Does column slice of a pandas dataframe with columns of different data types create a view or a copy?

No comments:

Post a Comment