I am working on one of the transformer models that has been proposed for video classification. My input tensor has the shape of [batch=16 ,channels=3 ,frames=16, H=224, W=224] and for applying the patch embedding on the input tensor it uses the following scenario:
patch_dim = in_channels * patch_size ** 2
self.to_patch_embedding = nn.Sequential(
Rearrange('b t c (h p1) (w p2) -> b t (h w) (p1 p2 c)', p1 = patch_size, p2 = patch_size),
nn.Linear(patch_dim, dim), ***** (Root of the error)******
)
The parameters that I am using are as follows:
patch_size =16
dim = 192
in_channels = 3
Unfortunately I receive the following error that corresponds to the line that has been shown in the code:
Exception has occured: RuntimeError
mat1 and mat2 shapes cannot be multiplied (9408x4096 and 768x192)
I thought a lot on the reason of the error but I couldn't find out what is the reason. How can I solve the problem?
from Dimension error by using Patch Embedding for video processing
No comments:
Post a Comment