I am working on an attention model, and before running the final model, I was going through the tensor shapes which flow through the code. I have an operation where I need to reshape the tensor. The tensor is of the shape torch.Size([[30, 8, 9, 64]])
where 30
is the batch_size
, 8
is the number of attention head (this is not relevant to my question) 9
is the number of words in the sentence and 64
is some intermediate embedding representation of the word. I have to reshape the tensor to a size of torch.size([30, 9, 512])
before processing it further. So I was looking into some reference online and they have done the following x.transpose(1, 2).contiguous().view(30, -1, 512)
whereas I was thinking that this should work x.transpose(1, 2).reshape(30, -1, 512)
.
In the first case the grad_fn
is <ViewBackward>
, whereas in my case it is <UnsafeViewBackward>
. Aren't these two the same operations? Will this result in a training error?
from Different `grad_fn` for similar looking operations in Pytorch (1.0)
No comments:
Post a Comment