Monday, 8 November 2021

Pandas Weighted Stats

I have a dataframe that looks like the one below.

d = {'location': ['a', 'a', 'b', 'b'], 'value': [1, 5, 3, 7], 'weight': [0.9, 0.1, 0.8, 0.2]}
df = pd.DataFrame(data=d)
df
  location value weight
0     a     1     0.9
1     a     5     0.1
2     b     3     0.8
3     b     7     0.2

I currently have code which will compute the grouped median, standard deviation, skew and quantiles for the unweighted data, I am using the below:

df = df[['location','value']]

df1 = df.groupby('location').agg(['median','skew','std']).reset_index()

df2 = df.groupby('location').quantile([0.1, 0.9, 0.25, 0.75, 0.5]).unstack(level=1).reset_index()

dfs = df1.merge(df2, how = 'left', on = 'location')

And the result is the following:

  location   value
             median skew      std  0.1  0.9 0.25 0.75  0.5
0      a         3  NaN  2.828427  1.4  4.6  2.0  4.0  3.0
1      b         5  NaN  2.828427  3.4  6.6  4.0  6.0  5.0

I would like to produce the exact same result data frame as the one above, however with weighted statistics using the weight column. How can I go about doing this?

One more important consideration to note, there are often times where value is null but it has a weight associated to it.



from Pandas Weighted Stats

No comments:

Post a Comment