Monday, 13 September 2021

Unclear why groupby with single group produces row DataFrame

Here's two groupby operations on a pandas.DataFrame:

import pandas


d = pandas.DataFrame({"a": [1, 2, 3, 4, 5, 6],
                      "b": [1, 2, 4, 3, -1, 5]})

grp1 = pandas.Series([1, 1, 1, 1, 1, 1])
ans1 = d.groupby(grp1).apply(lambda x: x.a * x.b.iloc[0])

grp2 = pandas.Series([1, 1, 1, 2, 2, 2])
ans2 = d.groupby(grp2).apply(lambda x: x.a * x.b.iloc[0])

print(ans1.reset_index(drop=True))
# a  0  1  2  3  4  5
# 0  1  2  3  4  5  6

print(ans2.reset_index(drop=True))
# 0     1
# 1     2
# 2     3
# 3    12
# 4    15
# 5    18
# Name: a, dtype: int64

I want the output in the format of ans2. If the grouping Series has more than one group (as in grp2), then there is no issue with the output format. However, when grouping Series has only one group (as in grp1), the output is a DataFrame with a single row. Why is this?

How can I ensure that the output will always be like ans2 regardless of the number of groups in the grouping Series? Is there a quicker/better approach than

  1. Checking if the output is a DataFrame and coercing into a Series
  2. Checking if the grouping Series has only one group and avoiding groupby if that's the case


from Unclear why groupby with single group produces row DataFrame

No comments:

Post a Comment