I have one data frame as: (df1)
df1_ID Col1_df1 Col2_df1 Col3_df1
ABC-001 a.102_103i k159* Test1
DEF-002 a.36-89E k188 Test2
GHI-003 ab.23<<X e542m Test3
df2:
df2_ID1 df2_ID2 Count Count_A Count_B To_Check
ABC-001 10 0 10 FIRSTLINE:a.102_103i:ANYTHING:EXTRA
DEF-002;GHI-003 20 2 18 SECONDLINE:ab.23<<X:ANYTHING:EXTRA
ABC-001;DEF-002 15 3 12 THIRDLINE:a.105:a.36-89D:ANYTHING:k188:EXTRA
RESULT (DF3):
df1_ID Col1_df1 Col2_df1 Col3_df1 df2_ID1 df2_ID2 Count Count_A Count_B To_Check
ABC-001 a.102_103i k159* Test1 ABC-001 10 0 10 FIRSTLINE:a.102_103i:ANYTHING:EXTRA:k159*
DEF-002 a.36-89 k188 Test2 ABC-001;DEF-002 15 3 12 THIRDLINE:a.105:a.36-89D:ANYTHING:k188:EXTRA
GHI-003 ab.23<<X e542m Test3 DEF-002;GHI-003 20 2 18 SECONDLINE:ab.23<<X:ANYTHING:EXTRA
I want to check if the values of Col1_df1 and Col2_df1 are present in the df2 column of To_Check. If the value exists from either Col1_df1 and Col2_df1 AND df1_ID is present either in the df2_ID1 or in df2_ID2, then merge that row of df2 to df1. If it doesn't match, then it should be BLANK.
An extension of this question:
Vlookup function / merge Pandas but not exact match
But in this question, we were just dealing with strings. In my data, we have special characters also.
This syntax also doesn't seem to work when try to find the value that exists within df2:
df1 = df1.assign(result=df1['Col1_df1'].isin(df2['To_Check']))
Wrote another syntax also but not working either:
output = open("output.csv", "a")
with open("df1.csv", "r") as df1:
first_line = df2.readline()
output.write(first_line)
with open("df2.csv", "r") as df2:
second_first = df2.readline()
output.write(second_first)
for line_df1 in df1:
df1_names = [x for x in line_df1.split(',')]
for line_df2 in df2:
df2_names = [x for x in line_df2.split(',')]
check1 = any(df1_names[1] in string for string in df2_names[6])
print(check1)
The check1 is always False although the value exists.
Thanks for your help in advance.
*UPDATE
data_1={'df1_ID':['ABC-001','DEF-002','GHI-003']
,'Col1_df1':['a.102_103i','a.36-89E','ab.23<<X']
,'Col2_df1':['k159*','k188','e542m']
,'Col3_df1':['Test1','Test2','Test3']}
data_2={'df2_ID1':['','DEF-002;GHI-003','ABC-001;DEF-002']
,'df2_ID2':['ABC-001','','']
,'Count':['10','20','15']
,'Count_A':['0','2','3']
,'To_Check':['FIRSTLINE:a.102_103i:ANYTHING:EXTRA','SECONDLINE:ab.23<<X:ANYTHING:EXTRA','THIRDLINE:a.105:a.36-89D:ANYTHING:k188:EXTRA']}
from Compare multiple columns with special characters and merge dataframes
No comments:
Post a Comment