I have 2 dataframes with strings in the cells:
df1
ID t1 t2 t3
0 x1 y1 z1
1 x2 y2 z2
2 x3 y3 z3
3 x4 y4 z4
4 x1 y5 z5
df2
ID t1 t2 t3
0 x3 y3 z3
1 x4 y4 z4
2 x1 y1 z1
3 x2 y2 z2
4 x1 y7 z5
I found that I can compare the differences in rows with:
#exactly the same t1, t2, and t3
pd.merge(df1, df2, on=['t1', 't2', 't3'], how='inner')
This will find an exact match between the rows (where t1 in df1 equals t1 in df2, etc.).
How can I find a semi match between the 2 dataframes for a specific column? That is, where there could be a difference in only the specified column in addition to the exact matches? For example, if I specify t2
, a match will be t1 in df1 = t1 in df2
, t2 in df1 != df2
, t3 in df1 = t3 in df3
(for example, row ID=4
in the 2 dataframes will match this in addition to the exact matches).
from Compare 2 DataFrames for semi matching rows
No comments:
Post a Comment