Skip to content

Instantly share code, notes, and snippets.

@zou3519
Created November 7, 2017 15:56
Show Gist options
  • Save zou3519/88c0b89ad958049548454247d3545a4c to your computer and use it in GitHub Desktop.
Save zou3519/88c0b89ad958049548454247d3545a4c to your computer and use it in GitHub Desktop.
Numbers for variance of an outer dimension on cuda, before and after numeric stability changes
import torch
tensor = torch.randn(100, 1).cuda()
%timeit tensor.var(0); torch.cuda.synchronize()
tensor = torch.randn(10000, 1).cuda()
%timeit tensor.var(0); torch.cuda.synchronize()
tensor = torch.randn(1000, 2, 10).cuda()
%timeit tensor.var(1); torch.cuda.synchronize()
tensor = torch.randn(10000, 2, 10).cuda()
%timeit tensor.var(1); torch.cuda.synchronize()
tensor = torch.randn(50000, 2, 10).cuda()
%timeit tensor.var(1); torch.cuda.synchronize()
tensor = torch.randn(2, 2, 2).cuda()
%timeit tensor.var(1); torch.cuda.synchronize()
tensor = torch.randn(100, 100, 100).cuda()
%timeit tensor.var(1); torch.cuda.synchronize()
tensor = torch.randn(1000, 1000, 100).cuda()
%timeit tensor.var(1); torch.cuda.synchronize()
tensor = torch.randn(5, 10000, 2).cuda()
%timeit tensor.var(1); torch.cuda.synchronize()
tensor = torch.randn(5, 100000, 2).cuda()
%timeit tensor.var(1); torch.cuda.synchronize()
Before changes
37.3 µs ± 94.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1.6 ms ± 401 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
31.2 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
99.6 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
477 µs ± 245 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
22.5 µs ± 399 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
60.2 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3.83 ms ± 604 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.62 ms ± 125 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
18.7 ms ± 1.25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
After changes
84.9 µs ± 446 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
6.25 ms ± 3.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
32.1 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
110 µs ± 80.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
532 µs ± 201 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
23.1 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
111 µs ± 336 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
5.48 ms ± 843 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.24 ms ± 7.47 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
65.7 ms ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment