Tuesday, 13 April 2021

Label Propagation in networks using Scikit and networkx

I have never used label propagation before, neither in Python, but now I would need to check if this can be suitable for my problem. I have a dataset like the following

User                    Connection       Net_value         Score
        xxx.dean.martin       vera.miles         16.2        12.0
        xxx.dean.martin       christopher.sole   12.4        12.0
        xxx.dean.martin       elis.con           5.3         12.0
        xxx.catherine.rice    vera.miles         10.0        NaN
        xxx.vera.miles        NaN                16.2        0.0

where Net_value depends on the relationship User/Connection, while Score depends only to User. I would like to build a graph where Users are nodes and Connection are the targets. This means that, for example, xxx.dean.martin is linked to vera.miles. Net_value gives the weight of the edge, while Score should be a value assigned to the node (i.e., xxx.dean.martin). Score can take a value between 0 and 20.0. As shown in the example, since some values is missing (NaN), I would like to use label propagation to assign Scores where they are missing. Looking at the last example,

      `xxx.vera.miles        NaN                16.2        0.0`

I should expect links between vera.miles, dean.martin and catherine.rice, but I am getting a NaN value. Also, since I have score values in both cases vera.miles (0.0) and dean.martin (12.0), I would expect an average value between the two (or some other values) assigned to catherine.rice.

Maybe there is some other way to assign these values, but using graph theory, I am only thinking of this way.

Example of output as dataset (that should come from a graph visualization):

 User                    Connection       Net_value         Score
            xxx.dean.martin       vera.miles         16.2        12.0
            xxx.dean.martin       christopher.sole   12.4        12.0
            xxx.dean.martin       elis.con           5.3         12.0
            xxx.catherine.rice    vera.miles         10.0        0.0
            xxx.vera.miles        dean.martin        16.2        6.0
            xxx.vera.miles        catherine.rice     16.2        0.0


from Label Propagation in networks using Scikit and networkx

No comments:

Post a Comment