Skip to content

Instantly share code, notes, and snippets.

@zomux
Created May 26, 2015 04:11
Show Gist options
  • Save zomux/8e47e3ee8e4fc25d7be8 to your computer and use it in GitHub Desktop.
Save zomux/8e47e3ee8e4fc25d7be8 to your computer and use it in GitHub Desktop.
Function profiling
==================
Message: None
Time in 0 calls to Function.__call__: 0.000000e+00s
Total compile time: 5.686516e+00s
Theano Optimizer time: 1.218326e+00s
Theano validate time: 2.887964e-03s
Theano Linker time (includes C, CUDA code generation/compiling): 4.449660e+00s
Function profiling
==================
Message: None
Time in 2500 calls to Function.__call__: 7.448946e+00s
Time in Function.fn.__call__: 7.271567e+00s (97.619%)
Time in thunks: 3.860834e+00s (51.831%)
Total compile time: 9.859906e+00s
Theano Optimizer time: 1.074716e+00s
Theano validate time: 7.506371e-03s
Theano Linker time (includes C, CUDA code generation/compiling): 8.776075e+00s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
43.8% 43.8% 1.691s 1.35e-04s C 12500 5 <class 'theano.tensor.blas.Dot22'>
31.5% 75.3% 1.215s 1.62e-04s C 7500 3 <class 'theano.tensor.blas.Gemm'>
7.8% 83.1% 0.301s 3.64e-06s C 82500 33 <class 'theano.tensor.elemwise.Elemwise'>
4.9% 88.0% 0.190s 7.61e-06s C 25000 10 <class 'theano.tensor.elemwise.DimShuffle'>
3.5% 91.5% 0.134s 5.95e-06s C 22500 9 <class 'theano.tensor.elemwise.Sum'>
3.1% 94.6% 0.121s 4.84e-05s Py 2500 1 <class 'theano.tensor.basic.MaxAndArgmax'>
1.0% 95.6% 0.038s 1.52e-05s Py 2500 1 <class 'theano.tensor.subtensor.AdvancedIncSubtensor'>
0.8% 96.4% 0.033s 2.18e-06s C 15000 6 <class 'theano.tensor.subtensor.Subtensor'>
0.8% 97.3% 0.032s 1.30e-05s Py 2500 1 <class 'theano.tensor.subtensor.AdvancedSubtensor'>
0.6% 97.9% 0.024s 1.90e-06s C 12500 5 <class 'theano.compile.ops.Shape_i'>
0.6% 98.5% 0.023s 9.08e-06s Py 2500 1 <class 'theano.tensor.basic.ARange'>
0.5% 99.0% 0.020s 7.83e-06s C 2500 1 <class 'theano.tensor.nnet.nnet.SoftmaxWithBias'>
0.4% 99.4% 0.017s 2.21e-06s C 7500 3 <class 'theano.tensor.opt.MakeVector'>
0.4% 99.8% 0.015s 3.03e-06s C 5000 2 <class 'theano.tensor.basic.Alloc'>
0.2% 100.0% 0.008s 3.19e-06s C 2500 1 <class 'theano.tensor.nnet.nnet.SoftmaxGrad'>
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
43.8% 43.8% 1.691s 1.35e-04s C 12500 5 Dot22
31.5% 75.3% 1.215s 1.62e-04s C 7500 3 Gemm{no_inplace}
3.6% 78.9% 0.140s 1.12e-05s C 12500 5 DimShuffle{1,0}
3.1% 82.0% 0.121s 4.84e-05s Py 2500 1 MaxAndArgmax
1.7% 83.7% 0.066s 5.31e-06s C 12500 5 Sum
1.6% 85.4% 0.062s 8.31e-06s C 7500 3 Sum{0}
1.4% 86.7% 0.053s 7.11e-06s C 7500 3 Elemwise{add,no_inplace}
1.3% 88.1% 0.051s 5.07e-06s C 10000 4 Elemwise{mul,no_inplace}
1.0% 89.1% 0.039s 7.87e-06s C 5000 2 Elemwise{gt,no_inplace}
1.0% 90.1% 0.038s 1.52e-05s Py 2500 1 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False}
0.8% 90.9% 0.032s 1.30e-05s Py 2500 1 AdvancedSubtensor
0.8% 91.7% 0.031s 3.09e-06s C 10000 4 Elemwise{abs_,no_inplace}
0.8% 92.5% 0.029s 3.90e-06s C 7500 3 DimShuffle{x,0}
0.6% 93.0% 0.023s 9.08e-06s Py 2500 1 ARange
0.6% 93.6% 0.021s 4.26e-06s C 5000 2 DimShuffle{x}
0.5% 94.1% 0.020s 2.04e-06s C 10000 4 Elemwise{Cast{float32}}
0.5% 94.6% 0.020s 2.70e-06s C 7500 3 Elemwise{Composite{[Composite{[Composite{[sub(i0, mul(i1, i2))]}(i0, a
0.5% 95.2% 0.020s 7.83e-06s C 2500 1 SoftmaxWithBias
0.5% 95.7% 0.019s 1.94e-06s C 10000 4 Elemwise{Composite{[true_div(true_div(i0, i1), i2)]}}
0.4% 96.1% 0.017s 6.80e-06s C 2500 1 Elemwise{log,no_inplace}
... (remaining 16 Ops account for 3.91%(0.15s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> <Gflops/s> <Apply name>
25.6% 25.6% 0.988s 3.95e-04s 2500 10 Dot22(x, W_dense1)
input 0: dtype=float32, shape=(20, 784), strides=c
input 1: dtype=float32, shape=(784, 256), strides=c
output 0: dtype=float32, shape=(20, 256), strides=c
21.1% 46.7% 0.816s 3.27e-04s 2500 80 Gemm{no_inplace}(W_dense1, Elemwise{neg,no_inplace}.0, x.T, E
input 0: dtype=float32, shape=(784, 256), strides=c
input 1: dtype=float32, shape=(), strides=c
input 2: dtype=float32, shape=(784, 20), strides=(4, 3136)
input 3: dtype=float32, shape=(20, 256), strides=c
input 4: dtype=float32, shape=(), strides=c
output 0: dtype=float32, shape=(784, 256), strides=c
8.9% 55.6% 0.342s 1.37e-04s 2500 75 Gemm{no_inplace}(W_dense2, Elemwise{neg,no_inplace}.0, DimShu
input 0: dtype=float32, shape=(256, 256), strides=c
input 1: dtype=float32, shape=(), strides=c
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024)
input 3: dtype=float32, shape=(20, 256), strides=c
input 4: dtype=float32, shape=(), strides=c
output 0: dtype=float32, shape=(256, 256), strides=c
8.5% 64.1% 0.327s 1.31e-04s 2500 35 Dot22(Elemwise{mul,no_inplace}.0, W_dense2)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(256, 256), strides=c
output 0: dtype=float32, shape=(20, 256), strides=c
8.0% 72.0% 0.307s 1.23e-04s 2500 76 Dot22(Elemwise{mul}.0, W_dense2.T)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(256, 256), strides=(4, 1024)
output 0: dtype=float32, shape=(20, 256), strides=c
3.1% 75.1% 0.121s 4.84e-05s 2500 54 MaxAndArgmax(Elemwise{add,no_inplace}.0, TensorConstant{(1,)
input 0: dtype=float32, shape=(20, 10), strides=c
input 1: dtype=int64, shape=(1,), strides=c
output 0: dtype=float32, shape=(20,), strides=c
output 1: dtype=int64, shape=(20,), strides=c
1.8% 77.0% 0.071s 2.85e-05s 2500 6 DimShuffle{1,0}(W_dense2)
input 0: dtype=float32, shape=(256, 256), strides=c
output 0: dtype=float32, shape=(256, 256), strides=(4, 1024)
1.5% 78.5% 0.057s 2.27e-05s 2500 67 Gemm{no_inplace}(W_dense3, Elemwise{neg,no_inplace}.0, DimShu
input 0: dtype=float32, shape=(256, 10), strides=c
input 1: dtype=float32, shape=(), strides=c
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024)
input 3: dtype=float32, shape=(20, 10), strides=c
input 4: dtype=float32, shape=(), strides=c
output 0: dtype=float32, shape=(256, 10), strides=c
1.0% 79.4% 0.038s 1.52e-05s 2500 45 Dot22(Elemwise{mul,no_inplace}.0, W_dense3)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(256, 10), strides=c
output 0: dtype=float32, shape=(20, 10), strides=c
1.0% 80.4% 0.038s 1.52e-05s 2500 41 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False
input 0: dtype=float32, shape=(20, 10), strides=c
input 1: dtype=float32, shape=(20,), strides=c
input 2: dtype=int64, shape=(20,), strides=c
input 3: dtype=int32, shape=(20,), strides=c
output 0: dtype=float32, shape=(20, 10), strides=c
0.8% 81.3% 0.032s 1.30e-05s 2500 62 AdvancedSubtensor(Elemwise{log,no_inplace}.0, ARange.0, k)
input 0: dtype=float32, shape=(20, 10), strides=c
input 1: dtype=int64, shape=(20,), strides=c
input 2: dtype=int32, shape=(20,), strides=c
output 0: dtype=float32, shape=(20,), strides=c
0.8% 82.1% 0.031s 1.25e-05s 2500 68 Dot22(SoftmaxGrad.0, W_dense3.T)
input 0: dtype=float32, shape=(20, 10), strides=c
input 1: dtype=float32, shape=(10, 256), strides=(4, 40)
output 0: dtype=float32, shape=(20, 256), strides=c
0.7% 82.8% 0.027s 1.07e-05s 2500 79 Sum{0}(Elemwise{mul}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
output 0: dtype=float32, shape=(256,), strides=c
0.7% 83.5% 0.026s 1.05e-05s 2500 74 Sum{0}(Elemwise{mul}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
output 0: dtype=float32, shape=(256,), strides=c
0.6% 84.1% 0.025s 9.87e-06s 2500 12 DimShuffle{1,0}(x)
input 0: dtype=float32, shape=(20, 784), strides=c
output 0: dtype=float32, shape=(784, 20), strides=(4, 3136)
0.6% 84.7% 0.024s 9.43e-06s 2500 15 Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(1, 256), strides=c
output 0: dtype=float32, shape=(20, 256), strides=c
0.6% 85.3% 0.023s 9.08e-06s 2500 20 ARange(TensorConstant{0}, Shape_i{0}.0, TensorConstant{1})
input 0: dtype=int8, shape=(), strides=c
input 1: dtype=int64, shape=(), strides=c
input 2: dtype=int8, shape=(), strides=c
output 0: dtype=int64, shape=(20,), strides=c
0.6% 85.9% 0.022s 8.96e-06s 2500 39 Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(1, 256), strides=c
output 0: dtype=float32, shape=(20, 256), strides=c
0.5% 86.4% 0.021s 8.37e-06s 2500 50 Sum(Elemwise{abs_,no_inplace}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
output 0: dtype=float32, shape=(), strides=c
0.5% 86.9% 0.021s 8.27e-06s 2500 40 Sum(Elemwise{abs_,no_inplace}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
output 0: dtype=float32, shape=(), strides=c
... (remaining 62 Apply instances account for 13.05%(0.50s) of the runtime)
Memory Profile
(Sparse variables are ignored)
---
Max if linker=cvm (default): unknown
Max if no gc (allow_gc=False): 1683KB
Max if linker=c|py: 1133KB
Memory saved if gc is enabled (linker=c|py): 550KB
<Sum apply outputs (bytes)> <Apply outputs shape> <created/inplace/view> <Apply node>
802816B [(784, 256)] c Gemm{no_inplace}(W_dense1, Elemwise{neg,no_inplace}.0, x.T, Elemwise{mul}.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0)
262144B [(256, 256)] c DimShuffle{1,0}(W_dense2)
262144B [(256, 256)] c Gemm{no_inplace}(W_dense2, Elemwise{neg,no_inplace}.0, DimShuffle{1,0}.0, Elemwise{mul}.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0)
62720B [(784, 20)] c DimShuffle{1,0}(x)
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0)
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0)
20480B [(20, 256)] c Elemwise{abs_,no_inplace}(Elemwise{mul,no_inplace}.0)
20480B [(20, 256)] c Dot22(SoftmaxGrad.0, W_dense3.T)
20480B [(20, 256)] c Elemwise{abs_,no_inplace}(Elemwise{mul,no_inplace}.0)
20480B [(20, 256)] c Dot22(Elemwise{mul,no_inplace}.0, W_dense2)
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0)
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0)
20480B [(20, 256)] c Dot22(Elemwise{mul}.0, W_dense2.T)
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0)
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0)
20480B [(20, 256)] c Dot22(x, W_dense1)
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0)
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0)
10240B [(10, 256)] c DimShuffle{1,0}(W_dense3)
10240B [(256, 10)] c Gemm{no_inplace}(W_dense3, Elemwise{neg,no_inplace}.0, DimShuffle{1,0}.0, SoftmaxGrad.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0)
... (remaining 62 Apply account for 26100B/1723124B ((1.51%)) of the Apply with dense outputs sizes)
<created/inplace/view> is taken from the Op's declaration.
Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases.
Function profiling
==================
Message: Sum of all printed profiles at exit excluding Scan op profile.
Time in 2500 calls to Function.__call__: 7.448946e+00s
Time in Function.fn.__call__: 7.271567e+00s (97.619%)
Time in thunks: 3.860834e+00s (51.831%)
Total compile time: 1.554642e+01s
Theano Optimizer time: 2.293042e+00s
Theano validate time: 1.039433e-02s
Theano Linker time (includes C, CUDA code generation/compiling): 1.322573e+01s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
43.8% 43.8% 1.691s 1.35e-04s C 12500 5 <class 'theano.tensor.blas.Dot22'>
31.5% 75.3% 1.215s 1.62e-04s C 7500 3 <class 'theano.tensor.blas.Gemm'>
7.8% 83.1% 0.301s 3.64e-06s C 82500 33 <class 'theano.tensor.elemwise.Elemwise'>
4.9% 88.0% 0.190s 7.61e-06s C 25000 10 <class 'theano.tensor.elemwise.DimShuffle'>
3.5% 91.5% 0.134s 5.95e-06s C 22500 9 <class 'theano.tensor.elemwise.Sum'>
3.1% 94.6% 0.121s 4.84e-05s Py 2500 1 <class 'theano.tensor.basic.MaxAndArgmax'>
1.0% 95.6% 0.038s 1.52e-05s Py 2500 1 <class 'theano.tensor.subtensor.AdvancedIncSubtensor'>
0.8% 96.4% 0.033s 2.18e-06s C 15000 6 <class 'theano.tensor.subtensor.Subtensor'>
0.8% 97.3% 0.032s 1.30e-05s Py 2500 1 <class 'theano.tensor.subtensor.AdvancedSubtensor'>
0.6% 97.9% 0.024s 1.90e-06s C 12500 5 <class 'theano.compile.ops.Shape_i'>
0.6% 98.5% 0.023s 9.08e-06s Py 2500 1 <class 'theano.tensor.basic.ARange'>
0.5% 99.0% 0.020s 7.83e-06s C 2500 1 <class 'theano.tensor.nnet.nnet.SoftmaxWithBias'>
0.4% 99.4% 0.017s 2.21e-06s C 7500 3 <class 'theano.tensor.opt.MakeVector'>
0.4% 99.8% 0.015s 3.03e-06s C 5000 2 <class 'theano.tensor.basic.Alloc'>
0.2% 100.0% 0.008s 3.19e-06s C 2500 1 <class 'theano.tensor.nnet.nnet.SoftmaxGrad'>
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
43.8% 43.8% 1.691s 1.35e-04s C 12500 5 Dot22
31.5% 75.3% 1.215s 1.62e-04s C 7500 3 Gemm{no_inplace}
3.6% 78.9% 0.140s 1.12e-05s C 12500 5 DimShuffle{1,0}
3.1% 82.0% 0.121s 4.84e-05s Py 2500 1 MaxAndArgmax
1.7% 83.7% 0.066s 5.31e-06s C 12500 5 Sum
1.6% 85.4% 0.062s 8.31e-06s C 7500 3 Sum{0}
1.4% 86.7% 0.053s 7.11e-06s C 7500 3 Elemwise{add,no_inplace}
1.3% 88.1% 0.051s 5.07e-06s C 10000 4 Elemwise{mul}
1.0% 89.1% 0.039s 7.87e-06s C 5000 2 Elemwise{gt,no_inplace}
1.0% 90.1% 0.038s 1.52e-05s Py 2500 1 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False}
0.8% 90.9% 0.032s 1.30e-05s Py 2500 1 AdvancedSubtensor
0.8% 91.7% 0.031s 3.09e-06s C 10000 4 Elemwise{abs_,no_inplace}
0.8% 92.5% 0.029s 3.90e-06s C 7500 3 DimShuffle{x,0}
0.6% 93.0% 0.023s 9.08e-06s Py 2500 1 ARange
0.6% 93.6% 0.021s 4.26e-06s C 5000 2 DimShuffle{x}
0.5% 94.1% 0.020s 2.04e-06s C 10000 4 Elemwise{Cast{float32}}
0.5% 94.6% 0.020s 2.70e-06s C 7500 3 Elemwise{Composite{[Composite{[Composite{[sub(i0, mul(i1, i2))]}(i0, a
0.5% 95.2% 0.020s 7.83e-06s C 2500 1 SoftmaxWithBias
0.5% 95.7% 0.019s 1.94e-06s C 10000 4 Elemwise{Composite{[true_div(true_div(i0, i1), i2)]}}
0.4% 96.1% 0.017s 6.80e-06s C 2500 1 Elemwise{log,no_inplace}
... (remaining 16 Ops account for 3.91%(0.15s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> <Gflops/s> <Apply name>
25.6% 25.6% 0.988s 3.95e-04s 2500 10 Dot22(x, W_dense1)
input 0: dtype=float32, shape=(20, 784), strides=c
input 1: dtype=float32, shape=(784, 256), strides=c
output 0: dtype=float32, shape=(20, 256), strides=c
21.1% 46.7% 0.816s 3.27e-04s 2500 80 Gemm{no_inplace}(W_dense1, Elemwise{neg,no_inplace}.0, x.T, E
input 0: dtype=float32, shape=(784, 256), strides=c
input 1: dtype=float32, shape=(), strides=c
input 2: dtype=float32, shape=(784, 20), strides=(4, 3136)
input 3: dtype=float32, shape=(20, 256), strides=c
input 4: dtype=float32, shape=(), strides=c
output 0: dtype=float32, shape=(784, 256), strides=c
8.9% 55.6% 0.342s 1.37e-04s 2500 75 Gemm{no_inplace}(W_dense2, Elemwise{neg,no_inplace}.0, DimShu
input 0: dtype=float32, shape=(256, 256), strides=c
input 1: dtype=float32, shape=(), strides=c
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024)
input 3: dtype=float32, shape=(20, 256), strides=c
input 4: dtype=float32, shape=(), strides=c
output 0: dtype=float32, shape=(256, 256), strides=c
8.5% 64.1% 0.327s 1.31e-04s 2500 35 Dot22(Elemwise{mul,no_inplace}.0, W_dense2)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(256, 256), strides=c
output 0: dtype=float32, shape=(20, 256), strides=c
8.0% 72.0% 0.307s 1.23e-04s 2500 76 Dot22(Elemwise{mul}.0, W_dense2.T)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(256, 256), strides=(4, 1024)
output 0: dtype=float32, shape=(20, 256), strides=c
3.1% 75.1% 0.121s 4.84e-05s 2500 54 MaxAndArgmax(Elemwise{add,no_inplace}.0, TensorConstant{(1,)
input 0: dtype=float32, shape=(20, 10), strides=c
input 1: dtype=int64, shape=(1,), strides=c
output 0: dtype=float32, shape=(20,), strides=c
output 1: dtype=int64, shape=(20,), strides=c
1.8% 77.0% 0.071s 2.85e-05s 2500 6 DimShuffle{1,0}(W_dense2)
input 0: dtype=float32, shape=(256, 256), strides=c
output 0: dtype=float32, shape=(256, 256), strides=(4, 1024)
1.5% 78.5% 0.057s 2.27e-05s 2500 67 Gemm{no_inplace}(W_dense3, Elemwise{neg,no_inplace}.0, DimShu
input 0: dtype=float32, shape=(256, 10), strides=c
input 1: dtype=float32, shape=(), strides=c
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024)
input 3: dtype=float32, shape=(20, 10), strides=c
input 4: dtype=float32, shape=(), strides=c
output 0: dtype=float32, shape=(256, 10), strides=c
1.0% 79.4% 0.038s 1.52e-05s 2500 45 Dot22(Elemwise{mul,no_inplace}.0, W_dense3)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(256, 10), strides=c
output 0: dtype=float32, shape=(20, 10), strides=c
1.0% 80.4% 0.038s 1.52e-05s 2500 41 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False
input 0: dtype=float32, shape=(20, 10), strides=c
input 1: dtype=float32, shape=(20,), strides=c
input 2: dtype=int64, shape=(20,), strides=c
input 3: dtype=int32, shape=(20,), strides=c
output 0: dtype=float32, shape=(20, 10), strides=c
0.8% 81.3% 0.032s 1.30e-05s 2500 62 AdvancedSubtensor(Elemwise{log,no_inplace}.0, ARange.0, k)
input 0: dtype=float32, shape=(20, 10), strides=c
input 1: dtype=int64, shape=(20,), strides=c
input 2: dtype=int32, shape=(20,), strides=c
output 0: dtype=float32, shape=(20,), strides=c
0.8% 82.1% 0.031s 1.25e-05s 2500 68 Dot22(SoftmaxGrad.0, W_dense3.T)
input 0: dtype=float32, shape=(20, 10), strides=c
input 1: dtype=float32, shape=(10, 256), strides=(4, 40)
output 0: dtype=float32, shape=(20, 256), strides=c
0.7% 82.8% 0.027s 1.07e-05s 2500 79 Sum{0}(Elemwise{mul}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
output 0: dtype=float32, shape=(256,), strides=c
0.7% 83.5% 0.026s 1.05e-05s 2500 74 Sum{0}(Elemwise{mul}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
output 0: dtype=float32, shape=(256,), strides=c
0.6% 84.1% 0.025s 9.87e-06s 2500 12 DimShuffle{1,0}(x)
input 0: dtype=float32, shape=(20, 784), strides=c
output 0: dtype=float32, shape=(784, 20), strides=(4, 3136)
0.6% 84.7% 0.024s 9.43e-06s 2500 15 Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(1, 256), strides=c
output 0: dtype=float32, shape=(20, 256), strides=c
0.6% 85.3% 0.023s 9.08e-06s 2500 20 ARange(TensorConstant{0}, Shape_i{0}.0, TensorConstant{1})
input 0: dtype=int8, shape=(), strides=c
input 1: dtype=int64, shape=(), strides=c
input 2: dtype=int8, shape=(), strides=c
output 0: dtype=int64, shape=(20,), strides=c
0.6% 85.9% 0.022s 8.96e-06s 2500 39 Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
input 1: dtype=float32, shape=(1, 256), strides=c
output 0: dtype=float32, shape=(20, 256), strides=c
0.5% 86.4% 0.021s 8.37e-06s 2500 50 Sum(Elemwise{abs_,no_inplace}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
output 0: dtype=float32, shape=(), strides=c
0.5% 86.9% 0.021s 8.27e-06s 2500 40 Sum(Elemwise{abs_,no_inplace}.0)
input 0: dtype=float32, shape=(20, 256), strides=c
output 0: dtype=float32, shape=(), strides=c
... (remaining 62 Apply instances account for 13.05%(0.50s) of the runtime)
Memory Profile
(Sparse variables are ignored)
---
Max if linker=cvm (default): unknown
Max if no gc (allow_gc=False): 1683KB
Max if linker=c|py: 1133KB
Memory saved if gc is enabled (linker=c|py): 550KB
<Sum apply outputs (bytes)> <Apply outputs shape> <created/inplace/view> <Apply node>
802816B [(784, 256)] c Gemm{no_inplace}(W_dense1, Elemwise{neg,no_inplace}.0, x.T, Elemwise{mul}.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0)
262144B [(256, 256)] c DimShuffle{1,0}(W_dense2)
262144B [(256, 256)] c Gemm{no_inplace}(W_dense2, Elemwise{neg,no_inplace}.0, DimShuffle{1,0}.0, Elemwise{mul}.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0)
62720B [(784, 20)] c DimShuffle{1,0}(x)
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0)
20480B [(20, 256)] c Dot22(SoftmaxGrad.0, W_dense3.T)
20480B [(20, 256)] c Elemwise{abs_,no_inplace}(Elemwise{mul,no_inplace}.0)
20480B [(20, 256)] c Dot22(Elemwise{mul,no_inplace}.0, W_dense2)
20480B [(20, 256)] c Dot22(Elemwise{mul}.0, W_dense2.T)
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0)
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0)
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0)
20480B [(20, 256)] c Dot22(x, W_dense1)
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0)
20480B [(20, 256)] c Elemwise{abs_,no_inplace}(Elemwise{mul,no_inplace}.0)
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0)
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0)
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0)
10240B [(10, 256)] c DimShuffle{1,0}(W_dense3)
10240B [(256, 10)] c Gemm{no_inplace}(W_dense3, Elemwise{neg,no_inplace}.0, DimShuffle{1,0}.0, SoftmaxGrad.0, Elemwise{Composite{[Composite{[add(i0, neg(i1))]}(i0, mul(i1, i2))]}}.0)
... (remaining 62 Apply account for 26100B/1723124B ((1.51%)) of the Apply with dense outputs sizes)
<created/inplace/view> is taken from the Op's declaration.
Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment