Monday, 18 November 2019

Generate dataframe columns based on constraints in current dataframe

I have a dataframe with the following columns :

Date_2      Date_1     is_B
02/08/2019  01/09/2019  1
02/08/2019  01/09/2019  1
02/08/2019  01/09/2019  0
02/08/2019  01/09/2019  0
.   .   .
.   .   .
.   .   .
31/08/2019  01/09/2019  0
31/08/2019  01/09/2019  0
31/08/2019  01/09/2019  0
31/08/2019  01/09/2019  0
31/08/2019  01/09/2019  0
31/08/2019  01/09/2019  1
31/08/2019  01/09/2019  1

I want to generate another dataframe df2 such that the output looks like the following :

Date_1  Total_count Total(is_b = 1) num_2   num_3   num_5   num_20
01/09/2019  493      147            26      30      32       59

Total_Count = total entries for Date_1 in the dataframe 
Total(is_b = 1) = total entries for Date_1 where is_b = 1
num_2 = total entries for Date_1 for 2 days where Date_2 = (Date_1 - 1 to Date_1 - 2){Both included as well}
num_3 = total entries for Date_1 for 3 days where Date_2 = (Date_1 - 3 to Date_1 - 5){Both included as well}
num_5 = total entries for Date_1 for 5 days where Date_2 = (Date_1 - 6 to Date_1 - 10){Both included as well}
num_20 = total entries for Date_1 for 20 days where Date_2 = (Date_1 - 11 to Date_1 - 30){Both included as well}

I was able to generate first 2 columns easily using :

df.groupby('Date_1')['Date_1'].count()

df.loc[df.isBooked == 1].groupby('Date_1')['Date_1'].count()

I am not sure how to calculate the other columns :

I did try this :

df.loc[(df.isBooked == 1) & (df.Booking_Date = Flight_Date - 1) & (df.Booking_Date = Flight_Date - 2)].groupby('Flight_Date')['Flight_Date'].count().reset_index(name='num_2')

But this is an invalid syntax altogether.

Can anyone help me with generating the columns num_2, num_3, num_5, num_20.



from Generate dataframe columns based on constraints in current dataframe

No comments:

Post a Comment