Friday, 25 June 2021

Checking for common interactors within a time range in python dataframe

I have the following dataset which can be recreated using the following code:

   import pandas as pd
    
    data3 =  [[20210308, 'A','B',100], 
              [20210307, 'B','A',95],
            [20210307, 'B','A',65],
            [20210305, 'A','C',50],
            [20210304, 'D','E',25],
            [20210303, 'E','D',200],
            [20210201,'E','B',10 ], 
            [20210101, 'X','X',50]] 
    
    df3 = pd.DataFrame(data3, columns = ['Date_1', 'Interactor_A','Interactor_B','Value']) 
    
    df3['Date_2'] = pd.to_datetime(df3['Date_1'], format='%Y%m%d')

The final dataset looks like this, I made the last column for date time in case some date time functions were required for the solution:

     Date_1 Interactor_A Interactor_B  Value     Date_2
0  20210308            A            B    100 2021-03-08
1  20210307            B            A     95 2021-03-07
2  20210307            B            A     65 2021-03-07
3  20210305            A            C     50 2021-03-05
4  20210304            D            E     25 2021-03-04
5  20210303            E            D    200 2021-03-03
6  20210201            E            B     10 2021-02-01
7  20210101            X            X     50 2021-01-01

What I'm trying to do is detect if two people interact with each other within a 3 day period and the value of their interaction is within a 90% to 110% range. That is does A interact with B as Interactor_A and does B interact back with A as Interactor_A and their interaction happens within a 3 day time window and is the value of their interactions within a 90% to 110% range?

By this logic, only the first two rows would be selected as A interacts with B and then B interacts back with A and the value of their interaction is within a 90% to 110% band because 95/100 or 100/95 is within a 90% to 110% band. Similarly the D-E and E-D interaction won't be selected because 25/200 and 200/25 are both outside of a 90% to 110% band. The B-A interaction in Index 2 also won't be selected because it's not in the 90% to 110% band even though it's in the 3 day window.

I'm trying a bunch of solutions via SQL:

  1. I tried an inner join on the table to itself but that just gave me results similar to the last row where X interacted with X
  2. I tried a self join that aggregated over a 3 day window but that only aggregates A's values over a 3 day period in total, doesn't capture any interactions.

Is there a way in Python that can only return the first two rows (A-B and B-A in Index 0 and 1) because they satisfy the condition of interacting with each other within a 3 day time window and the value of their interaction is within a 90% to 110% band?



from Checking for common interactors within a time range in python dataframe

No comments:

Post a Comment