Last active
November 6, 2017 20:10
-
-
Save zou3519/c301985417b0f22f6c34ff814d583008 to your computer and use it in GitHub Desktop.
Benchmarking inner dimension variance speed, before and after numerical stability changes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import torch | |
| tensor = torch.randn(100).cuda() | |
| %timeit tensor.var(0); torch.cuda.synchronize() | |
| tensor = torch.randn(10000).cuda() | |
| %timeit tensor.var(0); torch.cuda.synchronize() | |
| tensor = torch.randn(1000, 2, 10).cuda() | |
| %timeit tensor.var(2); torch.cuda.synchronize() | |
| tensor = torch.randn(10000, 2, 10).cuda() | |
| %timeit tensor.var(2); torch.cuda.synchronize() | |
| tensor = torch.randn(50000, 2, 10).cuda() | |
| %timeit tensor.var(2); torch.cuda.synchronize() | |
| tensor = torch.randn(2, 2, 2).cuda() | |
| %timeit tensor.var(2); torch.cuda.synchronize() | |
| tensor = torch.randn(100, 100, 100).cuda() | |
| %timeit tensor.var(2); torch.cuda.synchronize() | |
| tensor = torch.randn(1000, 10, 1000).cuda() | |
| %timeit tensor.var(2); torch.cuda.synchronize() | |
| tensor = torch.randn(5, 2, 10000).cuda() | |
| %timeit tensor.var(2); torch.cuda.synchronize() | |
| tensor = torch.randn(5, 2, 100000).cuda() | |
| %timeit tensor.var(2); torch.cuda.synchronize() | |
| Before changes: | |
| 25.2 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 255 µs ± 264 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) | |
| 27.3 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 66.9 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 236 µs ± 517 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) | |
| 23.1 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 75.3 µs ± 276 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 510 µs ± 338 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) | |
| 277 µs ± 337 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) | |
| 3.47 ms ± 787 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) | |
| After changes: | |
| 27.8 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 441 µs ± 927 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) | |
| 28.3 µs ± 231 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 72.9 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 278 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) | |
| 24.4 µs ± 312 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 87.1 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) | |
| 561 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) | |
| 463 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) | |
| 5.35 ms ± 1.49 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment