Faster MNIST.md

AlexPasqua commented May 21, 2021 •

edited

Loading

That's brilliant, thanks! 😃

Anyway, I've noticed that it is possible to completely bypass the DataLoaders and work directly with tensors. This will increase even more the performance.
For example, if the batch size is equal to the whole training set (i.e. batch training), the GPU usage is near 100% (NVIDIA GeForce MX150). Note that this will decrease as the batch size decreases.
Regarding the execution time: with a batch size of 64 and 100 epochs, the whole execution time went from around 238s to around 181s.
(This is just an indication, I haven't performed a complete and rigorous test).

To use directly the tensors:

from torch.utils.data import DataLoader

train_dataset = FastMNIST('data/MNIST', train=True, download=True)
test_dataset = FastMNIST('data/MNIST', train=False, download=True)

train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=0)
test_dataloader = DataLoader(test_dataset, batch_size=10000, shuffle=False, num_workers=0)

n_batches = math.ceil(len(train_dataset.data) / batch_size)
for batch_idx in range(n_batches):
    # select a slice from your training set as a minibatch
    train_batch = train_dataset.data[batch_idx * batch_size: batch_idx * batch_size + batch_size]

    # perform computations
    outputs = model(train_batch)
    ...

Furthermore, I think it's worth noticing that the normalization you perform (after the scaling) it's not always good, but this may depend on the task. For example, I'm working with autoencoders and if I don't comment that line I get bad results (in terms of reconstruction error).

Thanks again for your gist, I hope this can help improving even more the performance for small datasets like MNIST! 😃

Author

y0ast commented May 22, 2021 •

edited

Loading

Unfortunately that does not give the correct behavior: you're not randomizing your batches at each epoch which leads to significant reduced performance.

Yes this normalization is 0 mean, 1 std, for a VAE + MNIST you generally model your data as a multivariate bernoulli, which requires it to be between 0 and 1.

AlexPasqua commented May 22, 2021

Unfortunately that does not give the correct behavior: you're not randomizing your batches at each epoch which leads to significant reduced performance.

That's true, the shuffling should then be done manually. I think this should work:

train_dataset.data = train_dataset.data[torch.randperm(train_dataset.data.shape[0])]

(assuming the first dimension of train_dataset.data to be the batch size)

y0ast/Faster MNIST.md

AlexPasqua commented May 21, 2021 •

edited

Loading

Uh oh!

y0ast commented May 22, 2021 •

edited

Loading

Uh oh!

AlexPasqua commented May 22, 2021

Uh oh!

y0ast/Faster MNIST.md

AlexPasqua commented May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

y0ast commented May 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexPasqua commented May 22, 2021

Uh oh!

AlexPasqua commented May 21, 2021 •

edited

Loading

y0ast commented May 22, 2021 •

edited

Loading