Skip to content

Instantly share code, notes, and snippets.

@sbarratt
Created May 9, 2019 19:40
Show Gist options
  • Save sbarratt/37356c46ad1350d4c30aefbd488a4faa to your computer and use it in GitHub Desktop.
Save sbarratt/37356c46ad1350d4c30aefbd488a4faa to your computer and use it in GitHub Desktop.
Get the jacobian of a vector-valued function that takes batch inputs, in pytorch.
def get_jacobian(net, x, noutputs):
x = x.squeeze()
n = x.size()[0]
x = x.repeat(noutputs, 1)
x.requires_grad_(True)
y = net(x)
y.backward(torch.eye(noutputs))
return x.grad.data
@MasanoriYamada
Copy link

MasanoriYamada commented Jan 16, 2020

@RylanSchaeffer, My code only supports flatten tensor with batch. Please, show me that input-output tensor and network in your case (simple network is the best)

@MasanoriYamada
Copy link

@RylanSchaeffer, how about this https://gist.github.com/MasanoriYamada/d1d8ca884d200e73cca66a4387c7470a
Disclaimer: Value not tested for correctness

@RylanSchaeffer
Copy link

RylanSchaeffer commented Jan 16, 2020 via email

@RylanSchaeffer
Copy link

Question for anyone: why do we need to tile the input before passing it through the graph (net, in sbarratt's original code)? Why can't we tile the input and the output after the forward pass?

@RylanSchaeffer
Copy link

RylanSchaeffer commented Jan 20, 2020

I'm trying to do this currently, but I'm receiving the error: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Here's what I'm doing. Let x be the input to the graph with shape (batch size, input dimension) and let y be the output of the graph with shape (batch size, output dimension). I then select a subset of N random unit vectors. I stack x with itself and y with itself as follows:

x = torch.cat([x for _ in range(N)], dim=0)

and

y = torch.cat([y for _ in range(N)], dim=0)

x then has shape (N * batch size, input dim) and y has shape (N * batch size, output dim). But then, when I try to use autograd, I receive the aforementioned error .

        jacobian = torch.autograd.grad(
            outputs=y,
            inputs=y,
            grad_outputs=subset_unit_vectors,
            retain_graph=True,
            only_inputs=True)[0]

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Does anyone know why this is, and is there a way to make this post-forward pass tiling work?

@Jeff1995
Copy link

@RylanSchaeffer
I was trying the same thing with unsqueeze().expand(), but it leads to the same autograd error. I suppose it's because the newly created x and y nodes are just hanging in the computation graph, and do not really have a dependency, so autograd would no longer work.

@RylanSchaeffer
Copy link

RylanSchaeffer commented Apr 29, 2020 via email

@ChenAo-Phys
Copy link

I met this page about 1 year ago. This is really a nice trick, but it's a pity that it needs to forward pass a large batch and becomes a huge challenge to my GPU room. Recently I found an interesting way to bypass this problem. It's really interesting to solve a problem I encountered 1 year ago. https://github.com/ChenAo-Phys/pytorch-Jacobian

@justinblaber
Copy link

justinblaber commented May 31, 2020

If I'm understanding this correctly, this code will forward pass noutputs times just to compute the jacobian once (but do it in a vectorized way)... The 1.5.0 autograd jacobian computation seems to compute the output once but then forloops over it and call backward one by one (@rjeli first comment) which will for sure be slow... Both tradeoffs seem sub optimal.

Anyone know if there's an update on this? Or is pytorch really not meant to compute jacobians?

@sbarratt
Copy link
Author

sbarratt commented May 31, 2020 via email

@RylanSchaeffer
Copy link

@justinblaber , autodiff either computes matrix-vector products or vector-matrix products (depending on forward mode / reverse mode). The Jacobian is a matrix - there's no easy way to recover this by itself. Either you perform multiple backwards passes, using different elementary basis vector on each pass, or you blow the batch size up and do one massive backwards pass. There's no way around this.

@a-z-e-r-i-l-a
Copy link

how about this experimental api for jacobian: https://pytorch.org/docs/stable/_modules/torch/autograd/functional.html#jacobian
is it good?

@justinblaber
Copy link

how about this experimental api for jacobian: https://pytorch.org/docs/stable/_modules/torch/autograd/functional.html#jacobian
is it good?

I took a look and:

for j in range(out.nelement()):
            vj = _autograd_grad((out.reshape(-1)[j],), inputs, retain_graph=True, create_graph=create_graph)

It's just for-looping over the output and computing the gradient one by one (i.e. each row of the jacobian one by one). This will for sure be slow as hell if you have a lot of outputs. I actually think it's a tad bit deceiving that they advertise this functionality, because really the functionality just isn't there.

And actually, to be honest I wanted the jacobian earlier to do some gauss newton type optimization, but I've actually since discovered that the optim.LBFGS optimizer (now built into pytorch) might work well for my problem. I think it even has some backtracking type stuff built into it. So for now I don't think I even need the jacobian anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment