Hemant Vishwakarma: Pytorch geometric: Having issues with tensor sizes

This is the first time I'm using Pytorch and Pytorch geometric. I'm trying to create a simple Graph Neural Network with Pytorch Geometric. I'm creating a custom dataset by following the Pytorch Geometric documentations and extending the InMemoryDataset. After that I split the dataset into training, validation and test dataset which sizes (3496, 437, 439) respectively. These are the number of graphs in each dataset. Here is my simple Neural Network

class Net(torch.nn.Module):
def __init__(self):
    super(Net, self).__init__()
    self.conv1 = GCNConv(dataset.num_node_features, 10)
    self.conv2 = GCNConv(10, dataset.num_classes)

def forward(self, data):
    x, edge_index, batch = data.x, data.edge_index, data.batch
    x = self.conv1(x, edge_index)
    x = F.relu(x)
    x = F.dropout(x, training=self.training)
    x = self.conv2(x, edge_index)

    return F.log_softmax(x, dim=1)

I get this error while training my model, which suggest that there's some issue with my input dimensions. Maybe the reason is behind my batch sizes?

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "E:\Users\abc\Anaconda3\lib\site-packages\torch_scatter\scatter.py", line 22, in scatter_add
        size[dim] = int(index.max()) + 1
    out = torch.zeros(size, dtype=src.dtype, device=src.device)
    return out.scatter_add_(dim, index, src)
           ~~~~~~~~~~~~~~~~ <--- HERE
else:
    return out.scatter_add_(dim, index, src)
RuntimeError: index 13654 is out of bounds for dimension 0 with size 678

The error happens specifically on this line of code in the Neural Network,

x = self.conv1(x, edge_index)

EDIT: Added more information about edge_index and explained in more detail about the data that I'm using.

Here are the shapes of the variables that I'm trying to pass

x: torch.Size([678, 43])
edge_index: torch.Size([2, 668])
torch.max(edge_index): tensor(541690)
torch.min(edge_index): tensor(1920)

I'm using a datalist which contains Data(x=node_features, edge_index=edge_index, y=labels) objects. When I'm splitting the dataset into training, validation and test datasets, I get (3496, 437, 439) graphs in each dataset respectively. Originally I tried to create one single graph from my dataset, but I'm not sure how it would work with Dataloader and minibatches.

train_loader = DataLoader(train_dataset, batch_size=batch_size)
val_loader = DataLoader(val_dataset, batch_size=batch_size)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

Here's the code that generates the graph from dataframe. I've tried to create an simple graph where there are just some amount of vertices with some amount of edges connecting them. I've probably overlooked something and that's why I have this issue. I've tried to follow the Pytorch geometric documentation when creating this graph (Pytorch Geometric: Creating your own dataset)

def process(self):
        data_list = []

        grouped = df.groupby('EntityId')
        for id, group in grouped:
            node_features = torch.tensor(group.drop(['Labels'], axis=1).values)
            source_nodes = group.index[1:].values
            target_nodes = group.index[:-1].values
            labels = torch.tensor(group.Labels.values)
            edge_index = torch.tensor([source_nodes, target_nodes])

            data = Data(x=node_features, edge_index=edge_index, y=labels)
            data_list.append(data)

        if self.pre_filter is not None:
            data_list = [data for data in data_list if self.pre_filter(data)]

        if self.pre_transform is not None:
            data_list = [self.pre_transform(data) for data in data_list]

        data, slices = self.collate(data_list)
        torch.save((data, slices), self.processed_paths[0])

If someone could help me with the process of creating a graph on any kind of data and using it with GCNConv, I would appreciate it.

from Pytorch geometric: Having issues with tensor sizes

Hemant Vishwakarma

Wednesday, 2 September 2020

Pytorch geometric: Having issues with tensor sizes

No comments:

Post a Comment