Monday 31 August 2020

Iterating two dataframes and providing minimum rank on multiple condition

I want to iterate through two dataframes, one being large (with multiple columns and non null cols value) and one being small (with some common cols and null value in it).

The large dataframe is actual customer data with all attributes and other one is ranking data. I am trying to stamp minimum rank to all customer comparing it with ranking data.

The Lager dataframe look like this -

CUST_ID,DTL1,DTL2,DTLS3,AGE_BAND,SCORE,STATE,ATTR_1,ATTR_2,ATTR_3
1,xx,xx,xx,A1,S1,MH,1,1,6
2,xx,xx,xx,A1,S2,MH,1,2,7
3,xx,xx,xx,A2,S3,GJ,2,2,7
4,xx,xx,xx,A3,S1,RJ,1,2,6
5,xx,xx,xx,A2,S1,GJ,2,1,6
6,xx,xx,xx,A3,S3,RJ,1,2,7

and the ranking data -

Rank,AGE_BAND,SCORE,STATE,ATTR_1,ATTR_2
1,A1,S1,MH,Null,Null
2,A1,Null,MH,Null,1
3,Null,S1,GJ,Null,1
4,Null,S1,GJ,2,Null

Here, If we see, then cust_1 satisfy for both the rank- 1 and 2, but we would go for minimum i.e. 1. Same goes for Cust_5, rank 3.

I tried creating a nested for loop; the outer loop iterating through the large dataframe and the inner loop iterating through the small dataframe however I am having difficulties.

I'm looking for a way to identify that the "col_name" and "value" in my small dataframe that matches my large dataframe when not null and then assigning minimum rank to it.

I am trying to write some thing like below:

for cust in Data_Cust.iterrows():       
    for rank in Data_rank.iterrows():       #if we can eliminate columns where its value are null for individual rank
        if rank.col_name == cust.col_name && rank.col_value == cust.value  ##something from which we can match col/val name with col/value name of both dfs
        
            #create a list and appended all eligible ranks
            #selecting minimun rank at the end
            #appending list and min rank in Data_cust 


from Iterating two dataframes and providing minimum rank on multiple condition

No comments:

Post a Comment