Sunday, 1 July 2018

Pandas groupby apply vs transform with specific functions

I don't understand which functions are acceptable for groupby + transform operations. Often, I end up just guessing, testing, reverting until something works, but I feel there should be a systematic way of determining whether a solution will work.

Here's a minimal example. First let's use groupby + apply with set:

df = pd.DataFrame({'a': [1,2,3,1,2,3,3], 'b':[1,2,3,1,2,3,3], 'type':[1,0,1,0,1,0,1]})

g = df.groupby(['a', 'b'])['type'].apply(set)

print(g)

a  b
1  1    {0, 1}
2  2    {0, 1}
3  3    {0, 1}

This works fine, but I want the resulting set calculated groupwise in a new column of the original dataframe. So I try and use transform:

df['g'] = df.groupby(['a', 'b'])['type'].transform(set)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
---> 23 df['g'] = df.groupby(['a', 'b'])['type'].transform(set)

TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'

This is the error I see in Pandas v0.19.0. In v0.23.0, I see TypeError: 'set' type is unordered. Of course, I can map a specifically defined index to achieve my result:

g = df.groupby(['a', 'b'])['type'].apply(set)
df['g'] = df.set_index(['a', 'b']).index.map(g.get)

print(df)

   a  b  type       g
0  1  1     1  {0, 1}
1  2  2     0  {0, 1}
2  3  3     1  {0, 1}
3  1  1     0  {0, 1}
4  2  2     1  {0, 1}
5  3  3     0  {0, 1}
6  3  3     1  {0, 1}

But I thought the benefit of transform was to avoid such an explicit mapping. Where did I go wrong?

Versions I am using:



from Pandas groupby apply vs transform with specific functions

No comments:

Post a Comment