I have data which looks like this(I've set 'rule_id' as the index):
rule_id a b c d
50378 2 0 0 5
50402 12 9 6 0
52879 0 4 3 2
After using this code:
coeff = df.T
# compute the coefficients
for name, s in coeff.items():
top = 100 # start at 100
r = []
for i, v in enumerate(s):
if v == 0: # reset to 100 on a 0 value
top=100
else:
top = top/2 # else half the previous value
r.append(top)
coeff.loc[:, name] = r # set the whole column in one operation
# transpose back to have a companion dataframe for df
coeff = coeff.T
# build a new column from 2 consecutive ones, using the coeff dataframe
def build_comp(col1, col2, i):
conditions = [(df[col1] == 0) & (df[col2] == 0), (df[col1] != 0) & (df[col2] == 0), (df[col1] == df[col2]),
(df[col1] != 0) & (df[col2] != 0)]
choices = [np.nan , 100 , coeff[col1] , df[col2]/df[col1]*coeff[col1]+coeff[col1]]
df['comp{}'.format(i)] = np.select(conditions , choices)
old = df.columns[0] # store name of first column
#Ok, enumerate all the columns (except first one)
for i, col in enumerate(df.columns[1:], 1):
build_comp(old, col, i)
old = col # keep current column name for next iteration
# special processing for last comp column
df['comp{}'.format(i+1)] = np.where(df[col] == 0, np.nan, 100)
my data looks like this:
rule_id a b c d comp1 comp2 comp3 comp4
50378 2 0 0 5 100 NaN NaN 100
50402 12 9 6 0 87.5 41.66 100 NaN
52879 0 4 3 2 NaN 87.5 41.66 100
So 'df' here is the dataframe which stores my data which I have mentioned above. Look at the first row . According to my code , if two columns are compared and the first column has a non-zero value(2) and the second column has 0 , then 100 should be updated in the new column , which I am able to achieve , if there is comparison between more than one non-zero value (look at row 2) , then the comparison is like this:
9/12 *50 +50 = 87.5
then
6/9 * 25 + 25 = 41.66
which I am able to achieve but the third comparison between column 'c' and 'd' which is between value 6 and 0 should be:
0/6 *12.5 + 12.5 = 12.5
which I am having problem in achieving. So instead of 100 in row 2 comp3 , the value should be 12.5. Same goes for the last row too where values are 4 ,3 and 2
This is the result I want:
rule_id a b c d comp1 comp2 comp3 comp4
50378 2 0 0 5 100 NaN NaN 100
50402 12 9 6 0 87.5 41.66 12.5 NaN
52879 0 4 3 2 NaN 87.5 41.66 12.5
from Compare multiple columns of a dataframe and store the result in a new column
No comments:
Post a Comment