Skip to content

Instantly share code, notes, and snippets.

@zou3519
Last active November 6, 2017 20:10
Show Gist options
  • Select an option

  • Save zou3519/c301985417b0f22f6c34ff814d583008 to your computer and use it in GitHub Desktop.

Select an option

Save zou3519/c301985417b0f22f6c34ff814d583008 to your computer and use it in GitHub Desktop.
Benchmarking inner dimension variance speed, before and after numerical stability changes
import torch
tensor = torch.randn(100).cuda()
%timeit tensor.var(0); torch.cuda.synchronize()
tensor = torch.randn(10000).cuda()
%timeit tensor.var(0); torch.cuda.synchronize()
tensor = torch.randn(1000, 2, 10).cuda()
%timeit tensor.var(2); torch.cuda.synchronize()
tensor = torch.randn(10000, 2, 10).cuda()
%timeit tensor.var(2); torch.cuda.synchronize()
tensor = torch.randn(50000, 2, 10).cuda()
%timeit tensor.var(2); torch.cuda.synchronize()
tensor = torch.randn(2, 2, 2).cuda()
%timeit tensor.var(2); torch.cuda.synchronize()
tensor = torch.randn(100, 100, 100).cuda()
%timeit tensor.var(2); torch.cuda.synchronize()
tensor = torch.randn(1000, 10, 1000).cuda()
%timeit tensor.var(2); torch.cuda.synchronize()
tensor = torch.randn(5, 2, 10000).cuda()
%timeit tensor.var(2); torch.cuda.synchronize()
tensor = torch.randn(5, 2, 100000).cuda()
%timeit tensor.var(2); torch.cuda.synchronize()
Before changes:
25.2 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
255 µs ± 264 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
27.3 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
66.9 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
236 µs ± 517 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
23.1 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
75.3 µs ± 276 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
510 µs ± 338 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
277 µs ± 337 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.47 ms ± 787 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
After changes:
27.8 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
441 µs ± 927 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
28.3 µs ± 231 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
72.9 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
278 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
24.4 µs ± 312 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
87.1 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
561 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
463 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
5.35 ms ± 1.49 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment