Created
May 26, 2015 04:11
-
-
Save zomux/8e47e3ee8e4fc25d7be8 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Function profiling | |
================== | |
Message: None | |
Time in 0 calls to Function.__call__: 0.000000e+00s | |
Total compile time: 5.686516e+00s | |
Theano Optimizer time: 1.218326e+00s | |
Theano validate time: 2.887964e-03s | |
Theano Linker time (includes C, CUDA code generation/compiling): 4.449660e+00s | |
Function profiling | |
================== | |
Message: None | |
Time in 2500 calls to Function.__call__: 7.448946e+00s | |
Time in Function.fn.__call__: 7.271567e+00s (97.619%) | |
Time in thunks: 3.860834e+00s (51.831%) | |
Total compile time: 9.859906e+00s | |
Theano Optimizer time: 1.074716e+00s | |
Theano validate time: 7.506371e-03s | |
Theano Linker time (includes C, CUDA code generation/compiling): 8.776075e+00s | |
Class | |
--- | |
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name> | |
43.8% 43.8% 1.691s 1.35e-04s C 12500 5 <class 'theano.tensor.blas.Dot22'> | |
31.5% 75.3% 1.215s 1.62e-04s C 7500 3 <class 'theano.tensor.blas.Gemm'> | |
7.8% 83.1% 0.301s 3.64e-06s C 82500 33 <class 'theano.tensor.elemwise.Elemwise'> | |
4.9% 88.0% 0.190s 7.61e-06s C 25000 10 <class 'theano.tensor.elemwise.DimShuffle'> | |
3.5% 91.5% 0.134s 5.95e-06s C 22500 9 <class 'theano.tensor.elemwise.Sum'> | |
3.1% 94.6% 0.121s 4.84e-05s Py 2500 1 <class 'theano.tensor.basic.MaxAndArgmax'> | |
1.0% 95.6% 0.038s 1.52e-05s Py 2500 1 <class 'theano.tensor.subtensor.AdvancedIncSubtensor'> | |
0.8% 96.4% 0.033s 2.18e-06s C 15000 6 <class 'theano.tensor.subtensor.Subtensor'> | |
0.8% 97.3% 0.032s 1.30e-05s Py 2500 1 <class 'theano.tensor.subtensor.AdvancedSubtensor'> | |
0.6% 97.9% 0.024s 1.90e-06s C 12500 5 <class 'theano.compile.ops.Shape_i'> | |
0.6% 98.5% 0.023s 9.08e-06s Py 2500 1 <class 'theano.tensor.basic.ARange'> | |
0.5% 99.0% 0.020s 7.83e-06s C 2500 1 <class 'theano.tensor.nnet.nnet.SoftmaxWithBias'> | |
0.4% 99.4% 0.017s 2.21e-06s C 7500 3 <class 'theano.tensor.opt.MakeVector'> | |
0.4% 99.8% 0.015s 3.03e-06s C 5000 2 <class 'theano.tensor.basic.Alloc'> | |
0.2% 100.0% 0.008s 3.19e-06s C 2500 1 <class 'theano.tensor.nnet.nnet.SoftmaxGrad'> | |
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) | |
Ops | |
--- | |
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name> | |
43.8% 43.8% 1.691s 1.35e-04s C 12500 5 Dot22 | |
31.5% 75.3% 1.215s 1.62e-04s C 7500 3 Gemm{no_inplace} | |
3.6% 78.9% 0.140s 1.12e-05s C 12500 5 DimShuffle{1,0} | |
3.1% 82.0% 0.121s 4.84e-05s Py 2500 1 MaxAndArgmax | |
1.7% 83.7% 0.066s 5.31e-06s C 12500 5 Sum | |
1.6% 85.4% 0.062s 8.31e-06s C 7500 3 Sum{0} | |
1.4% 86.7% 0.053s 7.11e-06s C 7500 3 Elemwise{add,no_inplace} | |
1.3% 88.1% 0.051s 5.07e-06s C 10000 4 Elemwise{mul,no_inplace} | |
1.0% 89.1% 0.039s 7.87e-06s C 5000 2 Elemwise{gt,no_inplace} | |
1.0% 90.1% 0.038s 1.52e-05s Py 2500 1 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False} | |
0.8% 90.9% 0.032s 1.30e-05s Py 2500 1 AdvancedSubtensor | |
0.8% 91.7% 0.031s 3.09e-06s C 10000 4 Elemwise{abs_,no_inplace} | |
0.8% 92.5% 0.029s 3.90e-06s C 7500 3 DimShuffle{x,0} | |
0.6% 93.0% 0.023s 9.08e-06s Py 2500 1 ARange | |
0.6% 93.6% 0.021s 4.26e-06s C 5000 2 DimShuffle{x} | |
0.5% 94.1% 0.020s 2.04e-06s C 10000 4 Elemwise{Cast{float32}} | |
0.5% 94.6% 0.020s 2.70e-06s C 7500 3 Elemwise{Composite{[Composite{[Composite{[sub(i0, mul(i1, i2))]}(i0, a | |
0.5% 95.2% 0.020s 7.83e-06s C 2500 1 SoftmaxWithBias | |
0.5% 95.7% 0.019s 1.94e-06s C 10000 4 Elemwise{Composite{[true_div(true_div(i0, i1), i2)]}} | |
0.4% 96.1% 0.017s 6.80e-06s C 2500 1 Elemwise{log,no_inplace} | |
... (remaining 16 Ops account for 3.91%(0.15s) of the runtime) | |
Apply | |
------ | |
<% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> <Gflops/s> <Apply name> | |
25.6% 25.6% 0.988s 3.95e-04s 2500 10 Dot22(x, W_dense1) | |
input 0: dtype=float32, shape=(20, 784), strides=c | |
input 1: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
21.1% 46.7% 0.816s 3.27e-04s 2500 80 Gemm{no_inplace}(W_dense1, Elemwise{neg,no_inplace}.0, x.T, E | |
input 0: dtype=float32, shape=(784, 256), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(784, 20), strides=(4, 3136) | |
input 3: dtype=float32, shape=(20, 256), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(784, 256), strides=c | |
8.9% 55.6% 0.342s 1.37e-04s 2500 75 Gemm{no_inplace}(W_dense2, Elemwise{neg,no_inplace}.0, DimShu | |
input 0: dtype=float32, shape=(256, 256), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024) | |
input 3: dtype=float32, shape=(20, 256), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=c | |
8.5% 64.1% 0.327s 1.31e-04s 2500 35 Dot22(Elemwise{mul,no_inplace}.0, W_dense2) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
8.0% 72.0% 0.307s 1.23e-04s 2500 76 Dot22(Elemwise{mul}.0, W_dense2.T) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=(4, 1024) | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
3.1% 75.1% 0.121s 4.84e-05s 2500 54 MaxAndArgmax(Elemwise{add,no_inplace}.0, TensorConstant{(1,) | |
input 0: dtype=float32, shape=(20, 10), strides=c | |
input 1: dtype=int64, shape=(1,), strides=c | |
output 0: dtype=float32, shape=(20,), strides=c | |
output 1: dtype=int64, shape=(20,), strides=c | |
1.8% 77.0% 0.071s 2.85e-05s 2500 6 DimShuffle{1,0}(W_dense2) | |
input 0: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=(4, 1024) | |
1.5% 78.5% 0.057s 2.27e-05s 2500 67 Gemm{no_inplace}(W_dense3, Elemwise{neg,no_inplace}.0, DimShu | |
input 0: dtype=float32, shape=(256, 10), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024) | |
input 3: dtype=float32, shape=(20, 10), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(256, 10), strides=c | |
1.0% 79.4% 0.038s 1.52e-05s 2500 45 Dot22(Elemwise{mul,no_inplace}.0, W_dense3) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(256, 10), strides=c | |
output 0: dtype=float32, shape=(20, 10), strides=c | |
1.0% 80.4% 0.038s 1.52e-05s 2500 41 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False | |
input 0: dtype=float32, shape=(20, 10), strides=c | |
input 1: dtype=float32, shape=(20,), strides=c | |
input 2: dtype=int64, shape=(20,), strides=c | |
input 3: dtype=int32, shape=(20,), strides=c | |
output 0: dtype=float32, shape=(20, 10), strides=c | |
0.8% 81.3% 0.032s 1.30e-05s 2500 62 AdvancedSubtensor(Elemwise{log,no_inplace}.0, ARange.0, k) | |
input 0: dtype=float32, shape=(20, 10), strides=c | |
input 1: dtype=int64, shape=(20,), strides=c | |
input 2: dtype=int32, shape=(20,), strides=c | |
output 0: dtype=float32, shape=(20,), strides=c | |
0.8% 82.1% 0.031s 1.25e-05s 2500 68 Dot22(SoftmaxGrad.0, W_dense3.T) | |
input 0: dtype=float32, shape=(20, 10), strides=c | |
input 1: dtype=float32, shape=(10, 256), strides=(4, 40) | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
0.7% 82.8% 0.027s 1.07e-05s 2500 79 Sum{0}(Elemwise{mul}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(256,), strides=c | |
0.7% 83.5% 0.026s 1.05e-05s 2500 74 Sum{0}(Elemwise{mul}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(256,), strides=c | |
0.6% 84.1% 0.025s 9.87e-06s 2500 12 DimShuffle{1,0}(x) | |
input 0: dtype=float32, shape=(20, 784), strides=c | |
output 0: dtype=float32, shape=(784, 20), strides=(4, 3136) | |
0.6% 84.7% 0.024s 9.43e-06s 2500 15 Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(1, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
0.6% 85.3% 0.023s 9.08e-06s 2500 20 ARange(TensorConstant{0}, Shape_i{0}.0, TensorConstant{1}) | |
input 0: dtype=int8, shape=(), strides=c | |
input 1: dtype=int64, shape=(), strides=c | |
input 2: dtype=int8, shape=(), strides=c | |
output 0: dtype=int64, shape=(20,), strides=c | |
0.6% 85.9% 0.022s 8.96e-06s 2500 39 Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(1, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
0.5% 86.4% 0.021s 8.37e-06s 2500 50 Sum(Elemwise{abs_,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(), strides=c | |
0.5% 86.9% 0.021s 8.27e-06s 2500 40 Sum(Elemwise{abs_,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(), strides=c | |
... (remaining 62 Apply instances account for 13.05%(0.50s) of the runtime) | |
Memory Profile | |
(Sparse variables are ignored) | |
--- | |
Max if linker=cvm (default): unknown | |
Max if no gc (allow_gc=False): 1683KB | |
Max if linker=c|py: 1133KB | |
Memory saved if gc is enabled (linker=c|py): 550KB | |
<Sum apply outputs (bytes)> <Apply outputs shape> <created/inplace/view> <Apply node> | |
802816B [(784, 256)] c Gemm{no_inplace}(W_dense1, Elemwise{neg,no_inplace}.0, x.T, Elemwise{mul}.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0) | |
262144B [(256, 256)] c DimShuffle{1,0}(W_dense2) | |
262144B [(256, 256)] c Gemm{no_inplace}(W_dense2, Elemwise{neg,no_inplace}.0, DimShuffle{1,0}.0, Elemwise{mul}.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0) | |
62720B [(784, 20)] c DimShuffle{1,0}(x) | |
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{abs_,no_inplace}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Dot22(SoftmaxGrad.0, W_dense3.T) | |
20480B [(20, 256)] c Elemwise{abs_,no_inplace}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Dot22(Elemwise{mul,no_inplace}.0, W_dense2) | |
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Dot22(Elemwise{mul}.0, W_dense2.T) | |
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Dot22(x, W_dense1) | |
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
10240B [(10, 256)] c DimShuffle{1,0}(W_dense3) | |
10240B [(256, 10)] c Gemm{no_inplace}(W_dense3, Elemwise{neg,no_inplace}.0, DimShuffle{1,0}.0, SoftmaxGrad.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0) | |
... (remaining 62 Apply account for 26100B/1723124B ((1.51%)) of the Apply with dense outputs sizes) | |
<created/inplace/view> is taken from the Op's declaration. | |
Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases. | |
Function profiling | |
================== | |
Message: Sum of all printed profiles at exit excluding Scan op profile. | |
Time in 2500 calls to Function.__call__: 7.448946e+00s | |
Time in Function.fn.__call__: 7.271567e+00s (97.619%) | |
Time in thunks: 3.860834e+00s (51.831%) | |
Total compile time: 1.554642e+01s | |
Theano Optimizer time: 2.293042e+00s | |
Theano validate time: 1.039433e-02s | |
Theano Linker time (includes C, CUDA code generation/compiling): 1.322573e+01s | |
Class | |
--- | |
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name> | |
43.8% 43.8% 1.691s 1.35e-04s C 12500 5 <class 'theano.tensor.blas.Dot22'> | |
31.5% 75.3% 1.215s 1.62e-04s C 7500 3 <class 'theano.tensor.blas.Gemm'> | |
7.8% 83.1% 0.301s 3.64e-06s C 82500 33 <class 'theano.tensor.elemwise.Elemwise'> | |
4.9% 88.0% 0.190s 7.61e-06s C 25000 10 <class 'theano.tensor.elemwise.DimShuffle'> | |
3.5% 91.5% 0.134s 5.95e-06s C 22500 9 <class 'theano.tensor.elemwise.Sum'> | |
3.1% 94.6% 0.121s 4.84e-05s Py 2500 1 <class 'theano.tensor.basic.MaxAndArgmax'> | |
1.0% 95.6% 0.038s 1.52e-05s Py 2500 1 <class 'theano.tensor.subtensor.AdvancedIncSubtensor'> | |
0.8% 96.4% 0.033s 2.18e-06s C 15000 6 <class 'theano.tensor.subtensor.Subtensor'> | |
0.8% 97.3% 0.032s 1.30e-05s Py 2500 1 <class 'theano.tensor.subtensor.AdvancedSubtensor'> | |
0.6% 97.9% 0.024s 1.90e-06s C 12500 5 <class 'theano.compile.ops.Shape_i'> | |
0.6% 98.5% 0.023s 9.08e-06s Py 2500 1 <class 'theano.tensor.basic.ARange'> | |
0.5% 99.0% 0.020s 7.83e-06s C 2500 1 <class 'theano.tensor.nnet.nnet.SoftmaxWithBias'> | |
0.4% 99.4% 0.017s 2.21e-06s C 7500 3 <class 'theano.tensor.opt.MakeVector'> | |
0.4% 99.8% 0.015s 3.03e-06s C 5000 2 <class 'theano.tensor.basic.Alloc'> | |
0.2% 100.0% 0.008s 3.19e-06s C 2500 1 <class 'theano.tensor.nnet.nnet.SoftmaxGrad'> | |
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) | |
Ops | |
--- | |
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name> | |
43.8% 43.8% 1.691s 1.35e-04s C 12500 5 Dot22 | |
31.5% 75.3% 1.215s 1.62e-04s C 7500 3 Gemm{no_inplace} | |
3.6% 78.9% 0.140s 1.12e-05s C 12500 5 DimShuffle{1,0} | |
3.1% 82.0% 0.121s 4.84e-05s Py 2500 1 MaxAndArgmax | |
1.7% 83.7% 0.066s 5.31e-06s C 12500 5 Sum | |
1.6% 85.4% 0.062s 8.31e-06s C 7500 3 Sum{0} | |
1.4% 86.7% 0.053s 7.11e-06s C 7500 3 Elemwise{add,no_inplace} | |
1.3% 88.1% 0.051s 5.07e-06s C 10000 4 Elemwise{mul} | |
1.0% 89.1% 0.039s 7.87e-06s C 5000 2 Elemwise{gt,no_inplace} | |
1.0% 90.1% 0.038s 1.52e-05s Py 2500 1 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False} | |
0.8% 90.9% 0.032s 1.30e-05s Py 2500 1 AdvancedSubtensor | |
0.8% 91.7% 0.031s 3.09e-06s C 10000 4 Elemwise{abs_,no_inplace} | |
0.8% 92.5% 0.029s 3.90e-06s C 7500 3 DimShuffle{x,0} | |
0.6% 93.0% 0.023s 9.08e-06s Py 2500 1 ARange | |
0.6% 93.6% 0.021s 4.26e-06s C 5000 2 DimShuffle{x} | |
0.5% 94.1% 0.020s 2.04e-06s C 10000 4 Elemwise{Cast{float32}} | |
0.5% 94.6% 0.020s 2.70e-06s C 7500 3 Elemwise{Composite{[Composite{[Composite{[sub(i0, mul(i1, i2))]}(i0, a | |
0.5% 95.2% 0.020s 7.83e-06s C 2500 1 SoftmaxWithBias | |
0.5% 95.7% 0.019s 1.94e-06s C 10000 4 Elemwise{Composite{[true_div(true_div(i0, i1), i2)]}} | |
0.4% 96.1% 0.017s 6.80e-06s C 2500 1 Elemwise{log,no_inplace} | |
... (remaining 16 Ops account for 3.91%(0.15s) of the runtime) | |
Apply | |
------ | |
<% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> <Gflops/s> <Apply name> | |
25.6% 25.6% 0.988s 3.95e-04s 2500 10 Dot22(x, W_dense1) | |
input 0: dtype=float32, shape=(20, 784), strides=c | |
input 1: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
21.1% 46.7% 0.816s 3.27e-04s 2500 80 Gemm{no_inplace}(W_dense1, Elemwise{neg,no_inplace}.0, x.T, E | |
input 0: dtype=float32, shape=(784, 256), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(784, 20), strides=(4, 3136) | |
input 3: dtype=float32, shape=(20, 256), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(784, 256), strides=c | |
8.9% 55.6% 0.342s 1.37e-04s 2500 75 Gemm{no_inplace}(W_dense2, Elemwise{neg,no_inplace}.0, DimShu | |
input 0: dtype=float32, shape=(256, 256), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024) | |
input 3: dtype=float32, shape=(20, 256), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=c | |
8.5% 64.1% 0.327s 1.31e-04s 2500 35 Dot22(Elemwise{mul,no_inplace}.0, W_dense2) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
8.0% 72.0% 0.307s 1.23e-04s 2500 76 Dot22(Elemwise{mul}.0, W_dense2.T) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=(4, 1024) | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
3.1% 75.1% 0.121s 4.84e-05s 2500 54 MaxAndArgmax(Elemwise{add,no_inplace}.0, TensorConstant{(1,) | |
input 0: dtype=float32, shape=(20, 10), strides=c | |
input 1: dtype=int64, shape=(1,), strides=c | |
output 0: dtype=float32, shape=(20,), strides=c | |
output 1: dtype=int64, shape=(20,), strides=c | |
1.8% 77.0% 0.071s 2.85e-05s 2500 6 DimShuffle{1,0}(W_dense2) | |
input 0: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=(4, 1024) | |
1.5% 78.5% 0.057s 2.27e-05s 2500 67 Gemm{no_inplace}(W_dense3, Elemwise{neg,no_inplace}.0, DimShu | |
input 0: dtype=float32, shape=(256, 10), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024) | |
input 3: dtype=float32, shape=(20, 10), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(256, 10), strides=c | |
1.0% 79.4% 0.038s 1.52e-05s 2500 45 Dot22(Elemwise{mul,no_inplace}.0, W_dense3) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(256, 10), strides=c | |
output 0: dtype=float32, shape=(20, 10), strides=c | |
1.0% 80.4% 0.038s 1.52e-05s 2500 41 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False | |
input 0: dtype=float32, shape=(20, 10), strides=c | |
input 1: dtype=float32, shape=(20,), strides=c | |
input 2: dtype=int64, shape=(20,), strides=c | |
input 3: dtype=int32, shape=(20,), strides=c | |
output 0: dtype=float32, shape=(20, 10), strides=c | |
0.8% 81.3% 0.032s 1.30e-05s 2500 62 AdvancedSubtensor(Elemwise{log,no_inplace}.0, ARange.0, k) | |
input 0: dtype=float32, shape=(20, 10), strides=c | |
input 1: dtype=int64, shape=(20,), strides=c | |
input 2: dtype=int32, shape=(20,), strides=c | |
output 0: dtype=float32, shape=(20,), strides=c | |
0.8% 82.1% 0.031s 1.25e-05s 2500 68 Dot22(SoftmaxGrad.0, W_dense3.T) | |
input 0: dtype=float32, shape=(20, 10), strides=c | |
input 1: dtype=float32, shape=(10, 256), strides=(4, 40) | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
0.7% 82.8% 0.027s 1.07e-05s 2500 79 Sum{0}(Elemwise{mul}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(256,), strides=c | |
0.7% 83.5% 0.026s 1.05e-05s 2500 74 Sum{0}(Elemwise{mul}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(256,), strides=c | |
0.6% 84.1% 0.025s 9.87e-06s 2500 12 DimShuffle{1,0}(x) | |
input 0: dtype=float32, shape=(20, 784), strides=c | |
output 0: dtype=float32, shape=(784, 20), strides=(4, 3136) | |
0.6% 84.7% 0.024s 9.43e-06s 2500 15 Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(1, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
0.6% 85.3% 0.023s 9.08e-06s 2500 20 ARange(TensorConstant{0}, Shape_i{0}.0, TensorConstant{1}) | |
input 0: dtype=int8, shape=(), strides=c | |
input 1: dtype=int64, shape=(), strides=c | |
input 2: dtype=int8, shape=(), strides=c | |
output 0: dtype=int64, shape=(20,), strides=c | |
0.6% 85.9% 0.022s 8.96e-06s 2500 39 Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(1, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
0.5% 86.4% 0.021s 8.37e-06s 2500 50 Sum(Elemwise{abs_,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(), strides=c | |
0.5% 86.9% 0.021s 8.27e-06s 2500 40 Sum(Elemwise{abs_,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(), strides=c | |
... (remaining 62 Apply instances account for 13.05%(0.50s) of the runtime) | |
Memory Profile | |
(Sparse variables are ignored) | |
--- | |
Max if linker=cvm (default): unknown | |
Max if no gc (allow_gc=False): 1683KB | |
Max if linker=c|py: 1133KB | |
Memory saved if gc is enabled (linker=c|py): 550KB | |
<Sum apply outputs (bytes)> <Apply outputs shape> <created/inplace/view> <Apply node> | |
802816B [(784, 256)] c Gemm{no_inplace}(W_dense1, Elemwise{neg,no_inplace}.0, x.T, Elemwise{mul}.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0) | |
262144B [(256, 256)] c DimShuffle{1,0}(W_dense2) | |
262144B [(256, 256)] c Gemm{no_inplace}(W_dense2, Elemwise{neg,no_inplace}.0, DimShuffle{1,0}.0, Elemwise{mul}.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0) | |
62720B [(784, 20)] c DimShuffle{1,0}(x) | |
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
20480B [(20, 256)] c Dot22(SoftmaxGrad.0, W_dense3.T) | |
20480B [(20, 256)] c Elemwise{abs_,no_inplace}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Dot22(Elemwise{mul,no_inplace}.0, W_dense2) | |
20480B [(20, 256)] c Dot22(Elemwise{mul}.0, W_dense2.T) | |
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0) | |
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Dot22(x, W_dense1) | |
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{abs_,no_inplace}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
10240B [(10, 256)] c DimShuffle{1,0}(W_dense3) | |
10240B [(256, 10)] c Gemm{no_inplace}(W_dense3, Elemwise{neg,no_inplace}.0, DimShuffle{1,0}.0, SoftmaxGrad.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0) | |
... (remaining 62 Apply account for 26100B/1723124B ((1.51%)) of the Apply with dense outputs sizes) | |
<created/inplace/view> is taken from the Op's declaration. | |
Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment