Hemant Vishwakarma: Adding rows to a pandas dataframe with respect to existing entities within the dataframe

Friday, 3 June 2022

Adding rows to a pandas dataframe with respect to existing entities within the dataframe

I have a dataframe, filled with values that look like this. I only have rows for months where entities generated revenue, but entities may have had lifetimes longer than what is depicted by the revenue they're showing.

entity	revenue_generated	first_purchase	months_since_first_purchase
A	20	2022-01	0
A	60	2022-01	2
A	80	2022-01	3
A	15	2022-01	5
B	30	2022-03	0
B	10	2022-03	1
B	12	2022-03	2
G	25	2022-01	0
G	19	2022-01	1
G	90	2022-01	2

For a quick explanation of a lifetime in this context, Entity A first purchased during January 2022. 5 months since their first purchase, where they generated $15 is the maximum of the lifetime. In other words, since I am writing this in June 2022, we have no visibility of their purchase history in July 2022. So their maximum potential lifetime as a customer is 5 months, (representing the 6 observable months of Jan, Feb, March, April, May and June and starting the count from 0)

For simplicity's sake, let's say B first purchased in March 2022, so their maximum is 3. So A's maximum potential lifetime is represented in the dataset, but B's and G's are not.

G also made their first purchased in January 2022. So their maximum months_since_first_purchase value is also 5, but they did not generate any revenue during that month so they are not represented.

I want to change the dataset so it includes all of the months_since_first_purchase for each entity and includes their revenue in that month as 0. So, my goal dataset (with additions emphasised) is:

entity	revenue_generated	first_purchase	months_since_first_purchase
A	20	2022-01	0
A	0	2022-01	1
A	60	2022-01	2
A	80	2022-01	3
A	0	2022-01	4
A	15	2022-01	5
B	30	2022-03	0
B	10	2022-03	1
B	12	2022-03	2
B	0	2022-03	3
G	25	2022-01	0
G	19	2022-01	1
G	90	2022-01	2
G	0	2022-01	3
G	0	2022-01	4
G	0	2022-01	5

I currently have this implemented in a for loop where I iterate over a set of the entities, and build a new dataframe for each of them and concatenate it to a new main dataframe but this is quite slow. Is there a more pythonic way of approaching this problem with Pandas that doesn't involve iterating and rebuilding a new dataframe?

from Adding rows to a pandas dataframe with respect to existing entities within the dataframe

Hemant Vishwakarma

Friday, 3 June 2022

Adding rows to a pandas dataframe with respect to existing entities within the dataframe

No comments:

Post a Comment