nn.CrossEntropyLoss()
would automatically apply a softmax over the last layer.
Especially in Jupyter Lab.
This is particularly weird because it won't raise any errors and your loss will appear to be stable and the model will not learn anything
optimizer = optim.SGD(model1.parameters()) # Initialized with model1
...
...
preds = model2(inputs) # This is a different model2
loss = criterion(preds, labels)
loss.backward()
optimizer.step() # Nothing happens
Ideally you should move your model to the device before calling train(). This is easy to detect since Pytorch will raise a warning when you try this. Another mistake that will raise an error is if you have your model in one device and your input in another