I have the following dataset which can be recreated using the following code:
import pandas as pd
data3 = [[20210308, 'A','B',100],
[20210307, 'B','A',95],
[20210307, 'B','A',65],
[20210305, 'A','C',50],
[20210304, 'D','E',25],
[20210303, 'E','D',200],
[20210201,'E','B',10 ],
[20210101, 'X','X',50]]
df3 = pd.DataFrame(data3, columns = ['Date_1', 'Interactor_A','Interactor_B','Value'])
df3['Date_2'] = pd.to_datetime(df3['Date_1'], format='%Y%m%d')
The final dataset looks like this, I made the last column for date time in case some date time functions were required for the solution:
Date_1 Interactor_A Interactor_B Value Date_2
0 20210308 A B 100 2021-03-08
1 20210307 B A 95 2021-03-07
2 20210307 B A 65 2021-03-07
3 20210305 A C 50 2021-03-05
4 20210304 D E 25 2021-03-04
5 20210303 E D 200 2021-03-03
6 20210201 E B 10 2021-02-01
7 20210101 X X 50 2021-01-01
What I'm trying to do is detect if two people interact with each other within a 3 day period and the value of their interaction is within a 90% to 110% range. That is does A interact with B as Interactor_A and does B interact back with A as Interactor_A and their interaction happens within a 3 day time window and is the value of their interactions within a 90% to 110% range?
By this logic, only the first two rows would be selected as A interacts with B and then B interacts back with A and the value of their interaction is within a 90% to 110% band because 95/100 or 100/95 is within a 90% to 110% band. Similarly the D-E and E-D interaction won't be selected because 25/200 and 200/25 are both outside of a 90% to 110% band. The B-A interaction in Index 2 also won't be selected because it's not in the 90% to 110% band even though it's in the 3 day window.
I'm trying a bunch of solutions via SQL:
- I tried an inner join on the table to itself but that just gave me results similar to the last row where X interacted with X
- I tried a self join that aggregated over a 3 day window but that only aggregates A's values over a 3 day period in total, doesn't capture any interactions.
Is there a way in Python that can only return the first two rows (A-B and B-A in Index 0 and 1) because they satisfy the condition of interacting with each other within a 3 day time window and the value of their interaction is within a 90% to 110% band?
from Checking for common interactors within a time range in python dataframe
No comments:
Post a Comment