Skip to content

Instantly share code, notes, and snippets.

@jakelevi1996
Last active April 14, 2023 13:53
Show Gist options
  • Save jakelevi1996/b9b035780c5ea75a0c3d8299b9f935ff to your computer and use it in GitHub Desktop.
Save jakelevi1996/b9b035780c5ea75a0c3d8299b9f935ff to your computer and use it in GitHub Desktop.
Notes on PyTorch

Notes on PyTorch

Contents

Installing PyTorch

  • Instructions for installing PyTorch can be found here
  • In particular, PyTorch can be downloaded using pip
  • PyTorch has different versions depending on if the installation is to have both GPU and CPU support or only CPU support, and if the installation is to have GPU support, there are further different versions of PyTorch depending on the version of CUDA
  • At the time of writing, the most recent version of CUDA which is supported by a version of PyTorch is CUDA 11.7
  • The different versions of CUDA are available to download here
  • In particular, I downloaded CUDA 11.7.1 (available for download here)
  • During the installation process, the installer states that Visual Studio is required, but doesn't specify which packages within Visual Studio are required
  • This post on the Nvidia developer forum suggests that the C++ development workload ("Desktop Development with C++") is "the only one required", however this workload alone requires about 8 GB of storage space
  • In the end, I assumed Visual Studio would only be required for CUDA compilation with nvcc, but that PyTorch would come with pre-compiled binaries for GPU integration and would only require access to CUDA runtimes and not CUDA compilation, and therefore nvcc and by extension Visual Studio wouldn't be required, so I didn't install Visual Studio
  • I then installed PyTorch (version 1.13.0+cu117) with the following commands:
python -m pip install -U pip
python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117

This installation appears to work fine, and commands such as torch.cuda.is_available() and torch.tensor([2.,3,4], requires_grad=True).cuda() return results suggesting that GPU support is working successfully.

Simple automatic differentiation

Gradients can be calculated by calling the backward method, and should be set to zero (EG by calling tensor.grad.zero_()) between gradient calculations to avoid accumulating gradients from previous calculations, for example:

import numpy as np
import torch

w = torch.tensor(np.ones(5), requires_grad=True, dtype=torch.float32)
t = torch.tensor(np.arange(5), requires_grad=False, dtype=torch.float32)

def loss(w, t):
    return torch.sum(torch.square(w - t))

def init_grad(x):
    x.sum().backward()
    zero_grad(x)

def zero_grad(x):
    x.grad.zero_()

def print_tensor(x):
    grad = x.grad.numpy() if (x.grad is not None) else None
    print("%s, grad = %s" % (x.detach().numpy(), grad))

print_tensor(w)
# [1. 1. 1. 1. 1.], grad = None
print_tensor(t)
# [0. 1. 2. 3. 4.], grad = None
init_grad(w)
print_tensor(w)
# [1. 1. 1. 1. 1.], grad = [0. 0. 0. 0. 0.]
loss(w, t).backward()
print_tensor(w)
# [1. 1. 1. 1. 1.], grad = [ 2.  0. -2. -4. -6.]

with torch.no_grad():
    w -= 0.1 * w.grad
print_tensor(w)
# [0.8 1.  1.2 1.4 1.6], grad = [ 2.  0. -2. -4. -6.]
loss(w, t).backward()
print_tensor(w)
# [0.8 1.  1.2 1.4 1.6], grad = [  3.6   0.   -3.6  -7.2 -10.8]
zero_grad(w)
print_tensor(w)
# [0.8 1.  1.2 1.4 1.6], grad = [0. 0. 0. 0. 0.]
loss(w, t).backward()
print_tensor(w)
# [0.8 1.  1.2 1.4 1.6], grad = [ 1.6  0.  -1.6 -3.2 -4.8]

Note that w is modified in-place within a torch.no_grad() context. Trying to modify w in-place outside of such a context raises a RuntimeError, for example:

import torch
import numpy as np

w = torch.tensor(np.ones(5), requires_grad=True)
w.sum().backward()
w -= w.grad
# RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

Simple gradient descent

Repeatedly setting gradients to zero, calculating gradients, and updating variables towards the negative gradient direction can be used to define a simple gradient descent loop, for example:

import numpy as np
import torch
import plotting

w = torch.tensor(np.ones(5), requires_grad=True)
t = torch.tensor(np.arange(5), requires_grad=False)

def loss(w, t):
    return torch.sum(torch.square(w - t))

def init_grad(x):
    x.sum().backward()
    zero_grad(x)

def zero_grad(x):
    x.grad.zero_()

init_grad(w)
error_list = []
w_list = [w.detach().clone().numpy()]
print("Starting optimisation loop...")
for i in range(100):
    zero_grad(w)
    e = loss(w, t)
    e.backward()
    with torch.no_grad():
        w -= 5e-2 * w.grad
    error_list.append(e.item())
    w_list.append(w.detach().clone().numpy())

    print("\ri = %i, error = %.5f" % (i, e), end="", flush=True)

print()

plotting.plot(
    plotting.Line(error_list, c="r"),
    plot_name="Error vs iteration",
)
cp = plotting.ColourPicker(5)
line_list = [
    plotting.Line([w[j] for w in w_list], c=cp(j))
    for j in range(5)
]
plotting.plot(*line_list, plot_name="Weights vs iteration")

Using a DataLoader with a custom dataset

To use a DataLoader with a custom dataset, define a class for the dataset which is a subclass of torch.utils.data.Dataset (source), and implement the following methods:

Method name Purpose When it is called
__init__ Initialise the dataset, including any necessary attributes used by __len__ and __getitem__ When the dataset is initialised
__len__ Return the total number of datapoints in the dataset Called by a DataLoader, with frequency that depends on if the DataLoader is initialised with shuffle equal to True or False. If shuffle is False, __len__ is called once at the start of each epoch. If shuffle is True, __len__ is called twice when the DataLoader is initialised, and 3 times per epoch (twice at the start of the epoch, and once before or after the final batch, depending on if the number of data points is a multiple of the batch size)
__getitem__ Return a single data point (input and target) with the given index Called by a DataLoader once for every datapoint in every batch (IE batch_size number of times per batch)

Below is a toy example:

import torch
import numpy as np

class MockData(torch.utils.data.Dataset):
    def __init__(self, n=8):
        self._n = n
        self._x = np.arange(n)
        self._y = self._x + 100

    def __len__(self):
        print("Called len(MockData)")
        return self._n

    def __getitem__(self, index):
        print("Called MockData[%i]" % index)
        return self._x[index], self._y[index]

dataset = MockData()
data_loader = torch.utils.data.DataLoader(
    dataset=dataset,
    batch_size=3,
    shuffle=True,
)
for epoch in range(2):
    print("Epoch = %i" % epoch)
    for x, y in data_loader:
        print("Received batch x = %s, y = %s" % (x, y))

Output:

Called len(MockData)
Called len(MockData)
Epoch = 0
Called len(MockData)
Called len(MockData)
Called MockData[6]
Called MockData[1]
Called MockData[0]
Received batch x = tensor([6, 1, 0], dtype=torch.int32), y = tensor([106, 101, 100], dtype=torch.int32)
Called MockData[7]
Called MockData[5]
Called MockData[2]
Received batch x = tensor([7, 5, 2], dtype=torch.int32), y = tensor([107, 105, 102], dtype=torch.int32)
Called len(MockData)
Called MockData[4]
Called MockData[3]
Received batch x = tensor([4, 3], dtype=torch.int32), y = tensor([104, 103], dtype=torch.int32)
Epoch = 1
Called len(MockData)
Called len(MockData)
Called MockData[7]
Called MockData[1]
Called MockData[3]
Received batch x = tensor([7, 1, 3], dtype=torch.int32), y = tensor([107, 101, 103], dtype=torch.int32)
Called MockData[0]
Called MockData[2]
Called MockData[5]
Received batch x = tensor([0, 2, 5], dtype=torch.int32), y = tensor([100, 102, 105], dtype=torch.int32)
Called len(MockData)
Called MockData[6]
Called MockData[4]
Received batch x = tensor([6, 4], dtype=torch.int32), y = tensor([106, 104], dtype=torch.int32)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment