This answer provides a solution to get a rolling sum of a column grouped by another column based on a date window. To reproduce it here:
df = pd.DataFrame(
{
'ID': {0: 10001, 1: 10001, 2: 10001, 3: 10001, 4: 10002, 5: 10002, 6: 10002},
'Date': {
0: datetime.datetime(2019, 7, 1),
1: datetime.datetime(2019, 5, 1),
2: datetime.datetime(2019, 6, 25),
3: datetime.datetime(2019, 5, 27),
4: datetime.datetime(2019, 6, 29),
5: datetime.datetime(2019, 7, 18),
6: datetime.datetime(2019, 7, 15)
},
'Amount': {0: 50, 1: 15, 2: 10, 3: 20, 4: 25, 5: 35, 6: 40},
}
)
amounts = df.groupby(["ID"]).apply(lambda g: g.sort_values('Date').rolling('28d', on='Date').sum())
df['amount_4wk_rolling'] = df["Date"].map(amounts.set_index('Date')['Amount'])
Output:
+-------+------------+--------+--------------------+
| ID | Date | Amount | amount_4wk_rolling |
+-------+------------+--------+--------------------+
| 10001 | 01/07/2019 | 50 | 60 |
| 10001 | 01/05/2019 | 15 | 15 |
| 10001 | 25/06/2019 | 10 | 10 |
| 10001 | 27/05/2019 | 20 | 35 |
| 10002 | 29/06/2019 | 25 | 25 |
| 10002 | 18/07/2019 | 35 | 100 |
| 10002 | 15/07/2019 | 40 | 65 |
+-------+------------+--------+--------------------+
However, if two of the dates are the same then I get the error:
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
This makes sense as I can see on the final line that Date
is being used to set an index which is now no longer unique. However, as I don't really understand what that final line does I'm little stumped on trying to develop an alternative solution.
Could someone help out?
from groupby rolling date window sum with duplicate dates
No comments:
Post a Comment