Friday, 14 September 2018

Python: how to replace NaN with conditions in a dataframe?

I have a dataframe df1 that corresponds to the egelist of nodes in a network and value of the nodes themself like the following:

df
    node_i    node_j    value_i   value_j
0    3         4          89         33
1    3         2          89         NaN
2    3         5          89         69
3    0         2          45         NaN
4    0         3          45         89
5    1         2          109        NaN
6    1         8          109        NaN

I want to add a column w that correspond to the value_j if there is the value. If value_j is NaN I would like to set w as the average of the values of the adjacent nodes of i. In the case that node_i has only adjacent nodes with NaN values set w=1.

so the final dataframe should be like the foolowing:

df
    node_i    node_j    value_i   value_j      w
0    3         4          89         33       33
1    3         2          89         NaN      51      # average of adjacent nodes
2    3         5          89         69       69
3    0         2          45         NaN      89      # average of adjacent nodes
4    0         3          45         89       89
5    1         2          109        NaN       1      # 1
6    1         8          109        NaN       1      # 1

I am doing a loop like the following but I would like to use apply:

nodes = pd.unique(df['node_i'])
df['w'] = 0
for i in nodes:
    tmp = df[df['node_i'] == i]
    avg_w = np.mean(tmp['value_j'])
    if np.isnan(avg_w):
          df['w'][idx] = 1
    else:
          tmp.ix[tmp.value_j.isnull(), 'value_j'] = avg_w ## replace NaN with values
          df['w'][idx] = tmp['value_j'][idx]  



from Python: how to replace NaN with conditions in a dataframe?

No comments:

Post a Comment