Here's two groupby operations on a pandas.DataFrame:
import pandas
d = pandas.DataFrame({"a": [1, 2, 3, 4, 5, 6],
"b": [1, 2, 4, 3, -1, 5]})
grp1 = pandas.Series([1, 1, 1, 1, 1, 1])
ans1 = d.groupby(grp1).apply(lambda x: x.a * x.b.iloc[0])
grp2 = pandas.Series([1, 1, 1, 2, 2, 2])
ans2 = d.groupby(grp2).apply(lambda x: x.a * x.b.iloc[0])
print(ans1.reset_index(drop=True))
# a 0 1 2 3 4 5
# 0 1 2 3 4 5 6
print(ans2.reset_index(drop=True))
# 0 1
# 1 2
# 2 3
# 3 12
# 4 15
# 5 18
# Name: a, dtype: int64
I want the output in the format of ans2. If the grouping Series has more than one group (as in grp2), then there is no issue with the output format. However, when grouping Series has only one group (as in grp1), the output is a DataFrame with a single row. Why is this?
How can I ensure that the output will always be like ans2 regardless of the number of groups in the grouping Series? Is there a quicker/better approach than
- Checking if the output is a DataFrame and coercing into a Series
- Checking if the grouping Series has only one group and avoiding
groupbyif that's the case
from Unclear why groupby with single group produces row DataFrame
No comments:
Post a Comment