Hemant Vishwakarma: Different `grad_fn` for similar looking operations in Pytorch (1.0)

Thursday, 29 April 2021

Different `grad_fn` for similar looking operations in Pytorch (1.0)

I am working on an attention model, and before running the final model, I was going through the tensor shapes which flow through the code. I have an operation where I need to reshape the tensor. The tensor is of the shape torch.Size([[30, 8, 9, 64]]) where 30 is the batch_size, 8 is the number of attention head (this is not relevant to my question) 9 is the number of words in the sentence and 64 is some intermediate embedding representation of the word. I have to reshape the tensor to a size of torch.size([30, 9, 512]) before processing it further. So I was looking into some reference online and they have done the following x.transpose(1, 2).contiguous().view(30, -1, 512) whereas I was thinking that this should work x.transpose(1, 2).reshape(30, -1, 512).

In the first case the grad_fn is <ViewBackward>, whereas in my case it is <UnsafeViewBackward>. Aren't these two the same operations? Will this result in a training error?

from Different `grad_fn` for similar looking operations in Pytorch (1.0)

Hemant Vishwakarma

Thursday, 29 April 2021

Different `grad_fn` for similar looking operations in Pytorch (1.0)

No comments:

Post a Comment