Difference in differences (DID) is a statistical technique that calculates the effect of a treatment on an outcome by comparing the average change over time in the outcome variable for the treatment group [1]. I have this dataset where after means the months where the treatment was introduced and campaign explains if the treatment was applied, on a set of stores, and I want to observe the effect of the campaign on sales:
df = pd.DataFrame()
df['year'] = [2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011]
df['month'] = [1,2,3,4,5,6,1,2,3,4,5,6]
df['after'] = [0,0,0,1,1,1,0,0,0,1,1,1]
df['campaign'] = [0,0,0,0,0,0,1,1,1,1,1,1]
df['sales'] = [10000,11000,12000,10500,10000,9500,7000,8000,5000,6000,6000,7000]
year month after campaign sales
0 2011 1 0 0 10000
1 2011 2 0 0 11000
2 2011 3 0 0 12000
3 2011 4 1 0 10500
4 2011 5 1 0 10000
5 2011 6 1 0 9500
6 2011 1 0 1 7000
7 2011 2 0 1 8000
8 2011 3 0 1 5000
9 2011 4 1 1 6000
10 2011 5 1 1 6000
11 2011 6 1 1 7000
df_pv = df.pivot_table(index='campaign',columns = 'after', values = 'sales', aggfunc = np.sum)
df_pv['diff'] = df_pv[1]-df_pv[0]
df_pv['diff_pct'] = ((df_pv[1] - df_pv[0])/df_pv[0]).round(3)*100
df_pv
after 0 1 diff diff_pct
campaign
0 33000 30000 -3000 -9.1
1 20000 19000 -1000 -5.0
Even when there was a sales decrease for both stores QoQ (quarter over quarter), we can say that the campaign has some success, because the DID calculated (-5% - (-9.1%)) = 4.1% shows that the campaign had some effect.
What I want the difference of this two components to obtain 4.1 and put it in an additional row. I can obtain DID but how can I put the results in a new row?
DID = df_pv.iloc[1,3] - df_pv.iloc[0,3]
Expected table:
after 0 1 diff diff_pct
campaign
0 33000 30000 -3000 -9.1
1 20000 19000 -1000 -5.0
Total 53000 49000
DID 4.1
As you see, the table I guess got a transformation, is it possible to add the extra line to the table and put the text DID and the value?
Please, do you know what can I do? And if you know about DID, do you know a package in Python that makes such calculation and how to apply to this case? Thank you in advance for the help.
[1] Reference: Wikipedia (https://en.wikipedia.org/wiki/Difference_in_differences)
from Difference in differences DID in pandas with pivot table
No comments:
Post a Comment