Wednesday, 17 October 2018

Why pytorch DataLoader behaves differently on numpy array and list?

The only difference is one of the parameter passed to DataLoader is in type "numpy.array" and the other is in type "list", but the DataLoader gives totally different results.

You can use the following code to reproduce it:

from torch.utils.data import DataLoader,Dataset
import numpy as np

class my_dataset(Dataset):
    def __init__(self,data,label):
        self.data=data
        self.label=label          
    def __getitem__(self, index):
        return self.data[index],self.label[index]
    def __len__(self):
        return len(self.data)

train_data=[[1,2,3],[5,6,7],[11,12,13],[15,16,17]]
train_label=[-1,-2,-11,-12]

########################### Look at here:    

test=DataLoader(dataset=my_dataset(np.array(train_data),train_label),batch_size=2)
for i in test:
    print ("numpy data:")
    print (i)
    break


test=DataLoader(dataset=my_dataset(train_data,train_label),batch_size=2)
for i in test:
    print ("list data:")
    print (i)
    break

The result is:

numpy data:
[tensor([[1, 2, 3],
        [5, 6, 7]]), tensor([-1, -2])]
list data:
[[tensor([1, 5]), tensor([2, 6]), tensor([3, 7])], tensor([-1, -2])]  



from Why pytorch DataLoader behaves differently on numpy array and list?

No comments:

Post a Comment