Hemant Vishwakarma: Tracking claims using date/timestamp columns and creating a final count using pandas

I have an issue where I need to track the progression of patients insurance claim statuses based on the dates of those statuses. I also need to create a count of status based on certain conditions.

DF:

ClaimID	New	Accepted	Denied	Expired	Group
001	2021-01-01T09:58:35:335Z	2021-01-01T10:05:43:000Z			A
002	2021-01-01T06:30:30:000Z	2021-03-01T04:11:45:000Z	2021-03-01T04:11:53:000Z		A
003	2021-02-14T14:23:54:154Z	2021-02-15T11:11:56:000Z		2021-02-15T11:15:00:000Z	A
004	2021-02-14T15:36:05:335Z	2021-02-14T17:15:30:000Z			A
005	2021-02-14T15:56:59:009Z	2021-03-01T10:05:43:000Z			A

In the above dataset, we have 6 columns. ClaimID is simple and just indicates the ID of the claim. New, Accepted, Denied, Pending, and Expired indicate the status of the claim and the day/time those statuses were set.

What I need to do is get a count of how many claims are New on each day and how many move out of new into a new status. For example, There are 2 new claims on 2021-01-01. On that same day 1 moved to Accepted about 7 minutes later. Thus on 2021-01-01 the table of counts would read:

DF_Count:

Date	New	Accepted	Denied	Pending	Expired
2021-01-01	2	1	0	0	0
2021-01-02	1	0	0	0	0
2021-01-03	1	0	0	0	0
2021-01-04	1	0	0	0	0
2021-01-05	1	0	0	0	0
....	....	....	....	....	....
2021-02-14	4	2	0	0	0
2021-02-15	2	3	0	0	1
2021-02-16	2	2	0	0	0

Few Conditions:

If a claim moves from one status to the other on the same day (even if they are a minutes/hours apart) it would not be subtracted from the original status until the next day. This can be seen on 2021-01-01 where claim 001 moves from new to accepted on the same day but the claim is not subtracted from new until 2021-01-02.
Until something happens to a claim, it should remain in its original status. Claim 002 will remain in new until 2021-03-01 when it is approved.
If a claim changes status on a later date than its original status, it will be subtracted on that later date. For this, see status 003. It is new on 2/14 but accepted on 2/15. This is why New goes down by 2 on 2/15 (the other claim is the is 004 which is new and accepted on the same day)
For certain statuses, I do not need to look at all columns. For example, For new I only look at the dates inside Accepted and Denied. Not Pending and Expired. When I do these same steps for approved, I no longer need to look at new, just the other columns. How would I do that?
In the final DF_count table, the dates should start from the earliest date in 'New' and end on todays date.
The code needs to be grouped by the Group Column as well. For example, patients in group B (not pictured) will have to have the same start and end date but for their own claims.
I need to do this separately for all of the statuses. Not just new.

Current Solution:

My current solution has been to create an dataset with just dates from the min New Date to todays date. Then for each column, what I do is use the .loc method to find dates that are greater than New in each of the other columns. For example, in the code below I look for all cases where new is equal to approved.

df1 = df.loc[(df['New'] == df['Approved']) & 
((df['Expired'].isnull()) | (df['Expired'] >= df['Accepted'])) &
((df['Pending'].isnull()) | (df['Pending'] >= df['Accepted'])) &
((df['Denied'].isnull()) | (df['Denied'] >= df['Accepted']))]

newtoaccsday = df1.loc[:, ('Group', 'Accepted')]
newtoappsday['Date'] = newtoappsday['Accepted'] 
newtoappsday = newtoappsday.reset_index(drop = True)
newtoappsday= newtoappsday.groupby(['Date', 'Group'], as_index = False)['Approved'].value_counts()
newtoappsday.drop(columns = {'Accepted'}, inplace = True)
newtoappsday.rename(columns = {'count': 'NewAppSDay'}, inplace = True)
newtoappsday['Date'] = newtoappsday['Date'] + timedelta(1)
df_count= df_count.merge(newtoappsday, how = 'left', on = ['Date', 'Group']).fillna(0)


--After doing the above steps for all conditions (where new goes to accepted on a later date etc.) I will do the final calculation for new:

df_count['New'] = df_count.eval('New = New - (NewAccSDay + NewAccLater + NewDenSDay + NewDenLater + NewExpLater + NewPendSDay + NewPendLater)').groupby(['Tier2_ID', 'ClaimType'])['New'].cumsum()

Any and all help would be greatly appreciated. My method above is extremely inefficient and leading to some errors. Do I need to write a for loop for this? What is the best way to go about this.

from Tracking claims using date/timestamp columns and creating a final count using pandas

Hemant Vishwakarma

Wednesday, 23 November 2022

Tracking claims using date/timestamp columns and creating a final count using pandas

No comments:

Post a Comment