Monday, 24 May 2021

Difference in differences DID in pandas with pivot table

Difference in differences (DID) is a statistical technique that calculates the effect of a treatment on an outcome by comparing the average change over time in the outcome variable for the treatment group [1]. I have this dataset where after means the months where the treatment was introduced and campaign explains if the treatment was applied, on a set of stores, and I want to observe the effect of the campaign on sales:

df = pd.DataFrame()
df['year'] = [2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011]
df['month'] = [1,2,3,4,5,6,1,2,3,4,5,6]
df['after'] = [0,0,0,1,1,1,0,0,0,1,1,1]
df['campaign'] = [0,0,0,0,0,0,1,1,1,1,1,1]
df['sales'] = [10000,11000,12000,10500,10000,9500,7000,8000,5000,6000,6000,7000]


    year    month   after   campaign    sales
0   2011    1       0       0           10000
1   2011    2       0       0           11000
2   2011    3       0       0           12000
3   2011    4       1       0           10500
4   2011    5       1       0           10000
5   2011    6       1       0           9500
6   2011    1       0       1           7000
7   2011    2       0       1           8000
8   2011    3       0       1           5000
9   2011    4       1       1           6000
10  2011    5       1       1           6000
11  2011    6       1       1           7000

df_pv = df.pivot_table(index='campaign',columns = 'after', values = 'sales', aggfunc = np.sum)
df_pv['diff'] = df_pv[1]-df_pv[0]
df_pv['diff_pct'] = ((df_pv[1] - df_pv[0])/df_pv[0]).round(3)*100
df_pv

   after    0       1       diff    diff_pct
campaign                
0          33000    30000   -3000   -9.1
1          20000    19000   -1000   -5.0

Even when there was a sales decrease for both stores QoQ (quarter over quarter), we can say that the campaign has some success, because the DID calculated (-5% - (-9.1%)) = 4.1% shows that the campaign had some effect.

What I want the difference of this two components to obtain 4.1 and put it in an additional row. I can obtain DID but how can I put the results in a new row?

DID = df_pv.iloc[1,3] - df_pv.iloc[0,3]

Expected table:
   after    0       1       diff    diff_pct
campaign                
0          33000    30000   -3000   -9.1
1          20000    19000   -1000   -5.0
Total      53000    49000          
                            DID      4.1

As you see, the table I guess got a transformation, is it possible to add the extra line to the table and put the text DID and the value?

Please, do you know what can I do? And if you know about DID, do you know a package in Python that makes such calculation and how to apply to this case? Thank you in advance for the help.

[1] Reference: Wikipedia (https://en.wikipedia.org/wiki/Difference_in_differences)



from Difference in differences DID in pandas with pivot table

No comments:

Post a Comment