I have the following dataframe in python pandas which is over a million rows,
ID BOOKING_TIME ENTRY_TIME
23239 1/1/2020 0:00 1/1/2020 0:40
51042 1/1/2020 0:11 1/1/2020 0:42
73373 1/1/2020 0:15 1/1/2020 0:56
14222 1/1/2020 0:22 1/1/2020 1:00
27116 1/1/2020 0:55 1/1/2020 1:15
....
The two columns have been merged from 2 different sql tables.
The same user could make a booking further down the table again.
Each row shows a user creating a booking (entering the queue to enter the venue) and the time the user enters the venue.
I am trying to create a column that shows the number of people that are currently queuing based on the time a user makes a new booking.
When a booking is made, an entry under BOOKING_TIME is recorded and they join a queue.
When they enter the venue, the ENTRY_TIME is recorded and they leave the queue.
I want to do a running count of the people that are in the queue. Meaning that in each row, if a booking time is hit, it does +1 in the IN_QUEUE
column and once the entry time (for any of the cases in the previous timestamps) is hit, I want to -1 from the IN_QUEUE
.
It will create a new column like this. (Ignore the working column, that is just to show the process)
ID BOOKING_TIME ENTRY_TIME IN_QUEUE (working)
23239 1/1/2020 0:00 1/1/2020 0:40 1 +1
51042 1/1/2020 0:11 1/1/2020 0:42 2 +1
73373 1/1/2020 0:15 1/1/2020 0:56 3 +1
14222 1/1/2020 0:22 1/1/2020 1:00 4 +1
27116 1/1/2020 0:55 1/1/2020 1:15 3 -1 +1
....
Row 5 will -1
first because before 0:55am on 1/1/2020, the person from the first row entered the venue at 0:40am.
I think I have to use an apply function with a conditional function on every row but I am not sure how it would work.
I was thinking maybe for each row, count all rows where BOOKING_TIME < CURRENT BOOKING_TIME and ENTRY_TIME > CURRENT BOOKING_TIME, but I'm not sure how to code it or if my logic is even correct.
I saw this post Pandas: conditional rolling count but I cant figure out how to make it work with my case.
from Conditional running count in pandas based on conditions in 2 columns (counting number of people in a queue based on timestamps)
No comments:
Post a Comment