I have a dataframe with the following columns :
Date_2 Date_1 is_B
02/08/2019 01/09/2019 1
02/08/2019 01/09/2019 1
02/08/2019 01/09/2019 0
02/08/2019 01/09/2019 0
. . .
. . .
. . .
31/08/2019 01/09/2019 0
31/08/2019 01/09/2019 0
31/08/2019 01/09/2019 0
31/08/2019 01/09/2019 0
31/08/2019 01/09/2019 0
31/08/2019 01/09/2019 1
31/08/2019 01/09/2019 1
I want to generate another dataframe df2 such that the output looks like the following :
Date_1 Total_count Total(is_b = 1) num_2 num_3 num_5 num_20
01/09/2019 493 147 26 30 32 59
Total_Count = total entries for Date_1 in the dataframe
Total(is_b = 1) = total entries for Date_1 where is_b = 1
num_2 = total entries for Date_1 for 2 days where Date_2 = (Date_1 - 1 to Date_1 - 2){Both included as well}
num_3 = total entries for Date_1 for 3 days where Date_2 = (Date_1 - 3 to Date_1 - 5){Both included as well}
num_5 = total entries for Date_1 for 5 days where Date_2 = (Date_1 - 6 to Date_1 - 10){Both included as well}
num_20 = total entries for Date_1 for 20 days where Date_2 = (Date_1 - 11 to Date_1 - 30){Both included as well}
I was able to generate first 2 columns easily using :
df.groupby('Date_1')['Date_1'].count()
df.loc[df.isBooked == 1].groupby('Date_1')['Date_1'].count()
I am not sure how to calculate the other columns :
I did try this :
df.loc[(df.isBooked == 1) & (df.Booking_Date = Flight_Date - 1) & (df.Booking_Date = Flight_Date - 2)].groupby('Flight_Date')['Flight_Date'].count().reset_index(name='num_2')
But this is an invalid syntax altogether.
Can anyone help me with generating the columns num_2, num_3, num_5, num_20.
from Generate dataframe columns based on constraints in current dataframe
No comments:
Post a Comment