Thursday, 13 June 2019

Compare multiple columns of a dataframe and store the result in a new column

I have data which looks like this(I've set 'rule_id' as the index):

rule_id  a   b   c  d
50378    2   0   0  5
50402    12  9   6  0
52879    0   4   3  2

After using this code:

    coeff = df.T

    # compute the coefficients
    for name, s in coeff.items():
        top = 100              # start at 100
        r = []
        for i, v in enumerate(s):
            if v == 0:         # reset to 100 on a 0 value
                top=100
            else:
                top = top/2    # else half the previous value
            r.append(top)
        coeff.loc[:, name] = r # set the whole column in one operation

    # transpose back to have a companion dataframe for df
    coeff = coeff.T

    # build a new column from 2 consecutive ones, using the coeff dataframe
    def build_comp(col1, col2, i):


        conditions = [(df[col1] == 0) & (df[col2] == 0), (df[col1] != 0) & (df[col2] == 0), (df[col1] == df[col2]),
                      (df[col1] != 0) & (df[col2] != 0)]

        choices = [np.nan , 100 , coeff[col1] , df[col2]/df[col1]*coeff[col1]+coeff[col1]]

        df['comp{}'.format(i)] = np.select(conditions , choices)

    old = df.columns[0]          # store name of first column

    #Ok, enumerate all the columns (except first one)
    for i, col in enumerate(df.columns[1:], 1):
        build_comp(old, col, i)
        old = col                # keep current column name for next iteration
 # special processing for last comp column
df['comp{}'.format(i+1)] = np.where(df[col] == 0, np.nan, 100)

my data looks like this:

rule_id  a   b   c  d  comp1  comp2  comp3  comp4
50378    2   0   0  5   100    NaN    NaN    100
50402    12  9   6  0   87.5   41.66  100    NaN
52879    0   4   3  2   NaN    87.5  41.66  100 

So 'df' here is the dataframe which stores my data which I have mentioned above. Look at the first row . According to my code , if two columns are compared and the first column has a non-zero value(2) and the second column has 0 , then 100 should be updated in the new column , which I am able to achieve , if there is comparison between more than one non-zero value (look at row 2) , then the comparison is like this:

9/12 *50 +50 = 87.5

then

6/9 * 25 + 25 = 41.66

which I am able to achieve but the third comparison between column 'c' and 'd' which is between value 6 and 0 should be:

0/6 *12.5 + 12.5 = 12.5

which I am having problem in achieving. So instead of 100 in row 2 comp3 , the value should be 12.5. Same goes for the last row too where values are 4 ,3 and 2

This is the result I want:

rule_id  a   b   c  d  comp1  comp2  comp3  comp4
50378    2   0   0  5   100    NaN    NaN    100
50402    12  9   6  0   87.5   41.66  12.5   NaN
52879    0   4   3  2   NaN    87.5  41.66   12.5 



from Compare multiple columns of a dataframe and store the result in a new column

No comments:

Post a Comment