I have never used label propagation before, neither in Python, but now I would need to check if this can be suitable for my problem. I have a dataset like the following
User Connection Net_value Score
xxx.dean.martin vera.miles 16.2 12.0
xxx.dean.martin christopher.sole 12.4 12.0
xxx.dean.martin elis.con 5.3 12.0
xxx.catherine.rice vera.miles 10.0 NaN
xxx.vera.miles NaN 16.2 0.0
where Net_value depends on the relationship User/Connection, while Score depends only to User. I would like to build a graph where Users are nodes and Connection are the targets. This means that, for example, xxx.dean.martin is linked to vera.miles. Net_value gives the weight of the edge, while Score should be a value assigned to the node (i.e., xxx.dean.martin). Score can take a value between 0 and 20.0. As shown in the example, since some values is missing (NaN), I would like to use label propagation to assign Scores where they are missing. Looking at the last example,
`xxx.vera.miles NaN 16.2 0.0`
I should expect links between vera.miles, dean.martin and catherine.rice, but I am getting a NaN value. Also, since I have score values in both cases vera.miles (0.0) and dean.martin (12.0), I would expect an average value between the two (or some other values) assigned to catherine.rice.
Maybe there is some other way to assign these values, but using graph theory, I am only thinking of this way.
Example of output as dataset (that should come from a graph visualization):
User Connection Net_value Score
xxx.dean.martin vera.miles 16.2 12.0
xxx.dean.martin christopher.sole 12.4 12.0
xxx.dean.martin elis.con 5.3 12.0
xxx.catherine.rice vera.miles 10.0 0.0
xxx.vera.miles dean.martin 16.2 6.0
xxx.vera.miles catherine.rice 16.2 0.0
from Label Propagation in networks using Scikit and networkx
No comments:
Post a Comment