I'd like to produce a summary dataframe after grouping by date. I want to have a column that shows the mean of a given column as it is and the mean of that same column after filtering for instances that are greater than 0. I figured out how I can do this (below), but it requires doing two separate groupby calls, renaming the columns, and then joining them back together. I fell like one should be able to do this all in one call. I was trying to use eval to do this but kept getting an error and being told to use apply, that I couldn't use eval on a groupby object.
Code which gets me what I want but doesn't seem very efficient:
# Sample data
data = pd.DataFrame(
{"year" : [2013, 2013, 2013, 2014, 2014, 2014],
"month" : [1, 2, 3, 1, 2, 3],
"day": [1, 1, 1, 1, 1, 1],
"delay": [0, -4, 50, -60, 9, 10]})
subset = (data
.groupby(['year', 'month', 'day'])['delay']
.mean()
.reset_index()
.rename(columns = {'delay' : 'avg_delay'})
)
subset_1 = (data[data.delay > 0]
.groupby(['year', 'month', 'day'])['delay']
.mean()
.reset_index()
.rename(columns = {'delay' : 'avg_delay_pos'})
)
combined = pd.merge(subset, subset_1, how='left', on=['year', 'month', 'day']
from Perform multiple operations in a single groupby call with pandas?
No comments:
Post a Comment