I am calculating triad census as follows for my undirected network.
import networkx as nx
G = nx.Graph()
G.add_edges_from(
[('A', 'B'), ('A', 'C'), ('D', 'B'), ('E', 'C'), ('E', 'F'),
('B', 'H'), ('B', 'G'), ('B', 'F'), ('C', 'G')])
from itertools import combinations
#print(len(list(combinations(G.nodes, 3))))
triad_class = {}
for nodes in combinations(G.nodes, 3):
n_edges = G.subgraph(nodes).number_of_edges()
triad_class.setdefault(n_edges, []).append(nodes)
print(triad_class)
It works fine with small networks. However, now I have a bigger network with approximately 4000-8000 nodes. When I try to run my existing code with a network of 1000 nodes, it takes days to run. Is there a more efficient way of doing this?
My current network is mostly sparse. i.e. there are only few connections among the nodes. In that case, can I leave the unconnected nodes and do the computation first and later add the unconnceted nodes to the output?
I am also happy to get approximate answers without calculating every combination.
Example of triad census:
Triad census is dividing the triads (3 nodes) in to the four categories shown in the below figure.
For example consider the network below.
The triad census of the four classes are;
{3: [('A', 'B', 'C')],
2: [('A', 'B', 'D'), ('B', 'C', 'D'), ('B', 'D', 'E')],
1: [('A', 'B', 'E'), ('A', 'B', 'F'), ('A', 'B', 'G'), ('A', 'C', 'D'), ('A', 'C', 'E'), ('A', 'C', 'F'), ('A', 'C', 'G'), ('A', 'D', 'E'), ('A', 'F', 'G'), ('B', 'C', 'E'), ('B', 'C', 'F'), ('B', 'C', 'G'), ('B', 'D', 'F'), ('B', 'D', 'G'), ('B', 'F', 'G'), ('C', 'D', 'E'), ('C', 'F', 'G'), ('D', 'E', 'F'), ('D', 'E', 'G'), ('D', 'F', 'G'), ('E', 'F', 'G')],
0: [('A', 'D', 'F'), ('A', 'D', 'G'), ('A', 'E', 'F'), ('A', 'E', 'G'), ('B', 'E', 'F'), ('B', 'E', 'G'), ('C', 'D', 'F'), ('C', 'D', 'G'), ('C', 'E', 'F'), ('C', 'E', 'G')]}
I am happy to provide more details if needed.
EDIT: I was able to resolve the memory error by commenting the line #print(len(list(combinations(G.nodes, 3)))) as suggested in the answer. However, my program is still slow and takes days to run even with a network of 1000 nodes. I am looking for a more efficient way of doing this in python.
I am not limited to networkx and happy to accept answers using other libraries as well.
As always I am happy to provide more details as needed.
from How to efficiently calculate triad census in undirected graph in python


No comments:
Post a Comment