I have an issue where I need to track the progression of patients insurance claim statuses based on the dates of those statuses. I also need to create a count of status based on certain conditions.
DF:
ClaimID | New | Accepted | Denied | Pending | Expired | Group |
---|---|---|---|---|---|---|
001 | 2021-01-01T09:58:35:335Z | 2021-01-01T10:05:43:000Z | A | |||
002 | 2021-01-01T06:30:30:000Z | 2021-03-01T04:11:45:000Z | 2021-03-01T04:11:53:000Z | A | ||
003 | 2021-02-14T14:23:54:154Z | 2021-02-15T11:11:56:000Z | 2021-02-15T11:15:00:000Z | A | ||
004 | 2021-02-14T15:36:05:335Z | 2021-02-14T17:15:30:000Z | A | |||
005 | 2021-02-14T15:56:59:009Z | 2021-03-01T10:05:43:000Z | A |
In the above dataset, we have 6 columns. ClaimID is simple and just indicates the ID of the claim. New, Accepted, Denied, Pending, and Expired indicate the status of the claim and the day/time those statuses were set.
What I need to do is get a count of how many claims are New on each day and how many move out of new into a new status. For example, There are 2 new claims on 2021-01-01. On that same day 1 moved to Accepted about 7 minutes later. Thus on 2021-01-01 the table of counts would read:
DF_Count:
Date | New | Accepted | Denied | Pending | Expired |
---|---|---|---|---|---|
2021-01-01 | 2 | 1 | 0 | 0 | 0 |
2021-01-02 | 1 | 0 | 0 | 0 | 0 |
2021-01-03 | 1 | 0 | 0 | 0 | 0 |
2021-01-04 | 1 | 0 | 0 | 0 | 0 |
2021-01-05 | 1 | 0 | 0 | 0 | 0 |
.... | .... | .... | .... | .... | .... |
2021-02-14 | 4 | 2 | 0 | 0 | 0 |
2021-02-15 | 2 | 3 | 0 | 0 | 1 |
2021-02-16 | 2 | 2 | 0 | 0 | 0 |
Few Conditions:
- If a claim moves from one status to the other on the same day (even if they are a minutes/hours apart) it would not be subtracted from the original status until the next day. This can be seen on 2021-01-01 where claim 001 moves from new to accepted on the same day but the claim is not subtracted from new until 2021-01-02.
- Until something happens to a claim, it should remain in its original status. Claim 002 will remain in new until 2021-03-01 when it is approved.
- If a claim changes status on a later date than its original status, it will be subtracted on that later date. For this, see status 003. It is new on 2/14 but accepted on 2/15. This is why New goes down by 2 on 2/15 (the other claim is the is 004 which is new and accepted on the same day)
- For certain statuses, I do not need to look at all columns. For example, For new I only look at the dates inside Accepted and Denied. Not Pending and Expired. When I do these same steps for approved, I no longer need to look at new, just the other columns. How would I do that?
- In the final DF_count table, the dates should start from the earliest date in 'New' and end on todays date.
- The code needs to be grouped by the Group Column as well. For example, patients in group B (not pictured) will have to have the same start and end date but for their own claims.
- I need to do this separately for all of the statuses. Not just new.
Current Solution:
My current solution has been to create an dataset with just dates from the min New Date to todays date. Then for each column, what I do is use the .loc method to find dates that are greater than New in each of the other columns. For example, in the code below I look for all cases where new is equal to approved.
df1 = df.loc[(df['New'] == df['Approved']) &
((df['Expired'].isnull()) | (df['Expired'] >= df['Accepted'])) &
((df['Pending'].isnull()) | (df['Pending'] >= df['Accepted'])) &
((df['Denied'].isnull()) | (df['Denied'] >= df['Accepted']))]
newtoaccsday = df1.loc[:, ('Group', 'Accepted')]
newtoappsday['Date'] = newtoappsday['Accepted']
newtoappsday = newtoappsday.reset_index(drop = True)
newtoappsday= newtoappsday.groupby(['Date', 'Group'], as_index = False)['Approved'].value_counts()
newtoappsday.drop(columns = {'Accepted'}, inplace = True)
newtoappsday.rename(columns = {'count': 'NewAppSDay'}, inplace = True)
newtoappsday['Date'] = newtoappsday['Date'] + timedelta(1)
df_count= df_count.merge(newtoappsday, how = 'left', on = ['Date', 'Group']).fillna(0)
--After doing the above steps for all conditions (where new goes to accepted on a later date etc.) I will do the final calculation for new:
df_count['New'] = df_count.eval('New = New - (NewAccSDay + NewAccLater + NewDenSDay + NewDenLater + NewExpLater + NewPendSDay + NewPendLater)').groupby(['Tier2_ID', 'ClaimType'])['New'].cumsum()
Any and all help would be greatly appreciated. My method above is extremely inefficient and leading to some errors. Do I need to write a for loop for this? What is the best way to go about this.
from Tracking claims using date/timestamp columns and creating a final count using pandas
No comments:
Post a Comment