In my previous question I found how to use PyTorch's autograd with tensors:
import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim
class net_x(nn.Module):
def __init__(self):
super(net_x, self).__init__()
self.fc1=nn.Linear(1, 20)
self.fc2=nn.Linear(20, 20)
self.out=nn.Linear(20, 4) #a,b,c,d
def forward(self, x):
x=torch.tanh(self.fc1(x))
x=torch.tanh(self.fc2(x))
x=self.out(x)
return x
nx = net_x()
#input
t = torch.tensor([1.0, 2.0, 3.2], requires_grad = True) #input vector
t = torch.reshape(t, (3,1)) #reshape for batch
#method
dx = torch.autograd.functional.jacobian(lambda t_: nx(t_), t)
dx = torch.diagonal(torch.diagonal(dx, 0, -1), 0)[0] #first vector
#dx = torch.diagonal(torch.diagonal(dx, 1, -1), 0)[0] #2nd vector
#dx = torch.diagonal(torch.diagonal(dx, 2, -1), 0)[0] #3rd vector
#dx = torch.diagonal(torch.diagonal(dx, 3, -1), 0)[0] #4th vector
dx
>>>
tensor([-0.0142, -0.0517, -0.0634])
The issue is that grad only knows how to propagate gradients from a scalar tensor (which my network's output is not), which is why I had to calculate the Jacobian.
However, this is not very efficient and a bit slow as my matrix is large and calculating the entire Jacobian takes a while (and I'm also not using the entire Jacobian matrix).
Is there a way to calculate only the diagonals of the Jacobian (to get the 4 vectors in this example)?
There appears to be an open feature request but it doesn't appear to have gotten much attention.
Update 1:
I tried what @iacob said about setting torch.autograd.functional.jacobian(vectorize=True) However, this seems to be slower. To test this I changed my network output from 4 to 400, and my input t to be:
val = 100
t = torch.rand(val, requires_grad = True) #input vector
t = torch.reshape(t, (val,1)) #reshape for batch
Without vectorized = True:
Wall time: 10.4 s
With:
Wall time: 14.6 s
from Using PyTorch's autograd efficiently with tensors by calculating the Jacobian
No comments:
Post a Comment