Friday, 14 June 2019

Perform multiple operations in a single groupby call with pandas?

I'd like to produce a summary dataframe after grouping by date. I want to have a column that shows the mean of a given column as it is and the mean of that same column after filtering for instances that are greater than 0. I figured out how I can do this (below), but it requires doing two separate groupby calls, renaming the columns, and then joining them back together. I fell like one should be able to do this all in one call. I was trying to use eval to do this but kept getting an error and being told to use apply, that I couldn't use eval on a groupby object.

Code which gets me what I want but doesn't seem very efficient:

# Sample data

data = pd.DataFrame(
          {"year" : [2013, 2013, 2013, 2014, 2014, 2014],
           "month" : [1, 2, 3, 1, 2, 3],
           "day": [1, 1, 1, 1, 1, 1],
           "delay": [0, -4, 50, -60, 9, 10]})

subset = (data
          .groupby(['year', 'month', 'day'])['delay']
          .mean()
          .reset_index()
          .rename(columns = {'delay' : 'avg_delay'})
         )

subset_1 = (data[data.delay > 0]
          .groupby(['year', 'month', 'day'])['delay']
          .mean()
          .reset_index()
          .rename(columns = {'delay' : 'avg_delay_pos'})
         )

combined = pd.merge(subset, subset_1, how='left', on=['year', 'month', 'day']



from Perform multiple operations in a single groupby call with pandas?

No comments:

Post a Comment