Last active
August 29, 2015 14:21
-
-
Save zomux/b053662bd8a50ca523ee to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Function profiling | |
================== | |
Message: /home/hadoop/deepy/deepy/trainers/trainers.py:282 | |
Time in 382 calls to Function.__call__: 3.791379e+01s | |
Time in Function.fn.__call__: 3.779269e+01s (99.681%) | |
Time in thunks: 3.603890e+01s (95.055%) | |
Total compile time: 4.916809e-01s | |
Number of Apply nodes: 119 | |
Theano Optimizer time: 2.211621e-01s | |
Theano validate time: 1.533985e-03s | |
Theano Linker time (includes C, CUDA code generation/compiling): 1.592531e-01s | |
Import time 1.022055e-01s | |
Time in all call to theano.grad() 1.283312e-02s | |
Class | |
--- | |
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name> | |
81.2% 81.2% 29.274s 1.56e-03s C 18718 49 theano.tensor.elemwise.Elemwise | |
15.2% 96.4% 5.482s 4.78e-03s C 1146 3 theano.tensor.blas.Gemm | |
1.9% 98.4% 0.689s 3.61e-04s C 1910 5 theano.tensor.blas.Dot22 | |
0.6% 99.0% 0.223s 9.74e-05s C 2292 6 theano.tensor.elemwise.CAReduce | |
0.3% 99.3% 0.109s 2.38e-05s Py 4584 6 theano.ifelse.IfElse | |
0.2% 99.5% 0.083s 1.98e-05s C 4202 11 theano.tensor.elemwise.DimShuffle | |
0.1% 99.6% 0.050s 1.47e-05s C 3438 9 theano.tensor.elemwise.Sum | |
0.1% 99.8% 0.040s 1.04e-04s C 382 1 theano.tensor.basic.MaxAndArgmax | |
0.1% 99.9% 0.034s 9.02e-05s Py 382 1 theano.tensor.subtensor.AdvancedSubtensor | |
0.1% 99.9% 0.020s 8.57e-06s C 2292 6 theano.tensor.subtensor.Subtensor | |
0.0% 99.9% 0.014s 3.61e-05s Py 382 1 theano.tensor.subtensor.AdvancedIncSubtensor | |
0.0% 100.0% 0.005s 1.44e-05s Py 382 1 theano.tensor.basic.ARange | |
0.0% 100.0% 0.004s 2.26e-06s C 1910 8 theano.compile.ops.Shape_i | |
0.0% 100.0% 0.004s 9.21e-06s C 382 1 theano.tensor.nnet.nnet.SoftmaxWithBias | |
0.0% 100.0% 0.003s 2.56e-06s C 1146 3 theano.tensor.opt.MakeVector | |
0.0% 100.0% 0.002s 4.66e-06s C 382 7 theano.tensor.basic.Alloc | |
0.0% 100.0% 0.002s 4.24e-06s C 382 1 theano.tensor.nnet.nnet.SoftmaxGrad | |
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) | |
Ops | |
--- | |
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name> | |
23.4% 23.4% 8.444s 3.68e-03s C 2292 6 Elemwise{Composite{((i0 * i1) - (i2 * i3))}} | |
19.6% 43.0% 7.051s 2.05e-03s C 3438 9 Elemwise{add,no_inplace} | |
18.5% 61.5% 6.668s 4.36e-03s C 1528 4 Elemwise{mul} | |
15.2% 76.7% 5.482s 4.78e-03s C 1146 3 Gemm{no_inplace} | |
10.5% 87.2% 3.792s 4.96e-03s C 764 2 Elemwise{gt,no_inplace} | |
9.0% 96.2% 3.243s 4.24e-03s C 764 2 Elemwise{Composite{Abs((i0 * i1))}} | |
1.9% 98.1% 0.689s 3.61e-04s C 1910 5 Dot22 | |
0.6% 98.8% 0.223s 9.74e-05s C 2292 6 Reduce{maximum} | |
0.3% 99.1% 0.109s 2.38e-05s Py 4584 6 if{} | |
0.2% 99.2% 0.064s 3.33e-05s C 1910 5 DimShuffle{1,0} | |
0.1% 99.3% 0.040s 1.04e-04s C 382 1 MaxAndArgmax | |
0.1% 99.4% 0.034s 9.02e-05s Py 382 1 AdvancedSubtensor | |
0.1% 99.5% 0.032s 1.65e-05s C 1910 5 Sum{acc_dtype=float64} | |
0.1% 99.6% 0.020s 8.57e-06s C 2292 6 Subtensor{int64} | |
0.0% 99.6% 0.018s 1.55e-05s C 1146 3 Sum{axis=[0], acc_dtype=float64} | |
0.0% 99.7% 0.015s 1.32e-05s C 1146 3 DimShuffle{x,0} | |
0.0% 99.7% 0.014s 3.61e-05s Py 382 1 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False} | |
0.0% 99.7% 0.011s 6.97e-06s C 1528 4 Elemwise{Composite{((i0 / i1) / i2)}} | |
0.0% 99.8% 0.011s 2.78e-05s C 382 1 Elemwise{clip,no_inplace} | |
0.0% 99.8% 0.011s 2.76e-05s C 382 1 Elemwise{log,no_inplace} | |
... (remaining 19 Ops account for 0.20%(0.07s) of the runtime) | |
Apply | |
------ | |
<% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> <Gflops/s> <Apply name> | |
10.0% 10.0% 3.600s 9.42e-03s 382 110 Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense2_vel, DimShuffle{x,x}.0, if{}.0) | |
input 0: dtype=float32, shape=(1, 1), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=c | |
input 2: dtype=float32, shape=(1, 1), strides=c | |
input 3: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=c | |
9.9% 19.8% 3.554s 9.30e-03s 382 117 Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense1_vel, DimShuffle{x,x}.0, if{}.0) | |
input 0: dtype=float32, shape=(1, 1), strides=c | |
input 1: dtype=float32, shape=(784, 256), strides=c | |
input 2: dtype=float32, shape=(1, 1), strides=c | |
input 3: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(784, 256), strides=c | |
7.9% 27.8% 2.862s 7.49e-03s 382 91 Gemm{no_inplace}(W_dense2, TensorConstant{1.0}, DimShuffle{1,0}.0, Elemwise{mul}.0, TensorConstant{0.000199999994948}) | |
input 0: dtype=float32, shape=(256, 256), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024) | |
input 3: dtype=float32, shape=(20, 256), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=c | |
7.2% 35.0% 2.600s 6.81e-03s 382 103 Gemm{no_inplace}(W_dense1, TensorConstant{1.0}, x.T, Elemwise{mul}.0, TensorConstant{0.000199999994948}) | |
input 0: dtype=float32, shape=(784, 256), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(784, 20), strides=(4, 3136) | |
input 3: dtype=float32, shape=(20, 256), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(784, 256), strides=c | |
6.9% 41.9% 2.502s 6.55e-03s 382 18 Elemwise{add,no_inplace}(W_dense3, W_dense3_vel) | |
input 0: dtype=float32, shape=(256, 10), strides=c | |
input 1: dtype=float32, shape=(256, 10), strides=c | |
output 0: dtype=float32, shape=(256, 10), strides=c | |
6.6% 48.5% 2.375s 6.22e-03s 382 20 Elemwise{add,no_inplace}(W_dense2, W_dense2_vel) | |
input 0: dtype=float32, shape=(256, 256), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=c | |
6.2% 54.7% 2.223s 5.82e-03s 382 36 Elemwise{gt,no_inplace}(Elemwise{add,no_inplace}.0, TensorConstant{(1, 1) of 0}) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(1, 1), strides=c | |
output 0: dtype=int8, shape=(20, 256), strides=c | |
6.0% 60.7% 2.158s 5.65e-03s 382 97 Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
5.9% 66.6% 2.118s 5.55e-03s 382 22 Elemwise{add,no_inplace}(W_dense1, W_dense1_vel) | |
input 0: dtype=float32, shape=(784, 256), strides=c | |
input 1: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(784, 256), strides=c | |
5.9% 72.4% 2.109s 5.52e-03s 382 41 Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
4.6% 77.1% 1.674s 4.38e-03s 382 58 Elemwise{Composite{Abs((i0 * i1))}}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
4.4% 81.4% 1.569s 4.11e-03s 382 42 Elemwise{Composite{Abs((i0 * i1))}}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
4.4% 85.8% 1.569s 4.11e-03s 382 56 Elemwise{gt,no_inplace}(Elemwise{add,no_inplace}.0, TensorConstant{(1, 1) of 0}) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(1, 1), strides=c | |
output 0: dtype=int8, shape=(20, 256), strides=c | |
3.6% 89.3% 1.287s 3.37e-03s 382 57 Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
3.6% 92.9% 1.285s 3.36e-03s 382 99 Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense3_vel, DimShuffle{x,x}.0, if{}.0) | |
input 0: dtype=float32, shape=(1, 1), strides=c | |
input 1: dtype=float32, shape=(256, 10), strides=c | |
input 2: dtype=float32, shape=(1, 1), strides=c | |
input 3: dtype=float32, shape=(256, 10), strides=c | |
output 0: dtype=float32, shape=(256, 10), strides=c | |
3.1% 96.0% 1.114s 2.92e-03s 382 86 Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
1.3% 97.3% 0.469s 1.23e-03s 382 10 Dot22(x, W_dense1) | |
input 0: dtype=float32, shape=(20, 784), strides=c | |
input 1: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
0.4% 97.7% 0.159s 4.16e-04s 382 108 Reduce{maximum}(Gemm{no_inplace}.0) | |
input 0: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(), strides=c | |
0.3% 98.0% 0.102s 2.67e-04s 382 92 Dot22(Elemwise{mul}.0, W_dense2.T) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=(4, 1024) | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
0.3% 98.3% 0.090s 2.37e-04s 382 50 Dot22(Elemwise{mul,no_inplace}.0, W_dense2) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
... (remaining 99 Apply instances account for 1.72%(0.62s) of the runtime) | |
Memory Profile | |
(Sparse variables are ignored) | |
(For values in brackets, it's for linker = c|py | |
--- | |
Max if no gc (allow_gc=False): 5891KB (4839KB) | |
CPU: 5891KB (4839KB) | |
GPU: 0KB (0KB) | |
--- | |
Max if linker=cvm(default): 2176KB (2888KB) | |
CPU: 2176KB (2888KB) | |
GPU: 0KB (0KB) | |
--- | |
Memory saved if views are used: 0KB (0KB) | |
Memory saved if inplace ops are used: 0KB (0KB) | |
Memory saved if gc is enabled: 3714KB (1950KB) | |
--- | |
<Sum apply outputs (bytes)> <Apply outputs shape> <created/inplace/view> <Apply node> | |
802816B [(784, 256)] c Elemwise{add,no_inplace}(W_dense1, W_dense1_vel) | |
802816B [(784, 256)] c Gemm{no_inplace}(W_dense1, TensorConstant{1.0}, x.T, Elemwise{mul}.0, TensorConstant{0.000199999994948}) | |
802816B [(784, 256)] c Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense1_vel, DimShuffle{x,x}.0, if{}.0) | |
802816B [(784, 256)] c if{}(Elemwise{isnan,no_inplace}.0, Alloc.0, Gemm{no_inplace}.0) | |
262144B [(256, 256)] c if{}(Elemwise{isnan,no_inplace}.0, Alloc.0, Gemm{no_inplace}.0) | |
262144B [(256, 256)] c Gemm{no_inplace}(W_dense2, TensorConstant{1.0}, DimShuffle{1,0}.0, Elemwise{mul}.0, TensorConstant{0.000199999994948}) | |
262144B [(256, 256)] c Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense2_vel, DimShuffle{x,x}.0, if{}.0) | |
262144B [(256, 256)] c DimShuffle{1,0}(W_dense2) | |
262144B [(256, 256)] c Elemwise{add,no_inplace}(W_dense2, W_dense2_vel) | |
62720B [(784, 20)] c DimShuffle{1,0}(x) | |
20480B [(20, 256)] c Dot22(x, W_dense1) | |
20480B [(20, 256)] c Elemwise{Composite{Abs((i0 * i1))}}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{Composite{Abs((i0 * i1))}}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Dot22(Elemwise{mul}.0, W_dense2.T) | |
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
... (remaining 99 Apply account for 165430B/4954934B ((3.34%)) of the Apply with dense outputs sizes) | |
<created/inplace/view> is taken from the Op's declaration. | |
Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases. | |
Function profiling | |
================== | |
Message: Sum of all(2) printed profiles at exit excluding Scan op profile. | |
Time in 1882 calls to Function.__call__: 3.893968e+01s | |
Time in Function.fn.__call__: 3.878411e+01s (99.600%) | |
Time in thunks: 3.669480e+01s (94.235%) | |
Total compile time: 1.683521e+00s | |
Number of Apply nodes: 21 | |
Theano Optimizer time: 3.881671e-01s | |
Theano validate time: 1.974344e-03s | |
Theano Linker time (includes C, CUDA code generation/compiling): 2.107832e-01s | |
Import time 1.449776e-01s | |
Time in all call to theano.grad() 1.283312e-02s | |
Class | |
--- | |
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name> | |
80.0% 80.0% 29.357s 1.00e-03s C 29218 56 theano.tensor.elemwise.Elemwise | |
14.9% 94.9% 5.482s 4.78e-03s C 1146 3 theano.tensor.blas.Gemm | |
3.2% 98.2% 1.193s 1.86e-04s C 6410 8 theano.tensor.blas.Dot22 | |
0.6% 98.8% 0.223s 9.74e-05s C 2292 6 theano.tensor.elemwise.CAReduce | |
0.3% 99.1% 0.109s 2.38e-05s Py 4584 6 theano.ifelse.IfElse | |
0.3% 99.4% 0.095s 1.09e-05s C 8702 14 theano.tensor.elemwise.DimShuffle | |
0.2% 99.5% 0.058s 3.11e-05s C 1882 2 theano.tensor.basic.MaxAndArgmax | |
0.2% 99.7% 0.056s 8.72e-06s C 6438 11 theano.tensor.elemwise.Sum | |
0.1% 99.8% 0.047s 2.47e-05s Py 1882 2 theano.tensor.subtensor.AdvancedSubtensor | |
0.1% 99.8% 0.020s 8.57e-06s C 2292 6 theano.tensor.subtensor.Subtensor | |
0.0% 99.9% 0.015s 7.88e-06s Py 1882 2 theano.tensor.basic.ARange | |
0.0% 99.9% 0.014s 3.61e-05s Py 382 1 theano.tensor.subtensor.AdvancedIncSubtensor | |
0.0% 100.0% 0.011s 6.02e-06s C 1882 2 theano.tensor.nnet.nnet.SoftmaxWithBias | |
0.0% 100.0% 0.009s 1.79e-06s C 4910 10 theano.compile.ops.Shape_i | |
0.0% 100.0% 0.003s 2.56e-06s C 1146 3 theano.tensor.opt.MakeVector | |
0.0% 100.0% 0.002s 4.66e-06s C 382 7 theano.tensor.basic.Alloc | |
0.0% 100.0% 0.002s 4.24e-06s C 382 1 theano.tensor.nnet.nnet.SoftmaxGrad | |
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) | |
Ops | |
--- | |
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name> | |
23.0% 23.0% 8.444s 3.68e-03s C 2292 6 Elemwise{Composite{((i0 * i1) - (i2 * i3))}} | |
19.2% 42.2% 7.055s 1.43e-03s C 4938 10 Elemwise{add,no_inplace} | |
18.2% 60.4% 6.668s 4.36e-03s C 1528 4 Elemwise{mul} | |
14.9% 75.3% 5.482s 4.78e-03s C 1146 3 Gemm{no_inplace} | |
10.3% 85.7% 3.792s 4.96e-03s C 764 2 Elemwise{gt,no_inplace} | |
8.8% 94.5% 3.243s 4.24e-03s C 764 2 Elemwise{Composite{Abs((i0 * i1))}} | |
3.2% 97.8% 1.193s 1.86e-04s C 6410 8 Dot22 | |
0.6% 98.4% 0.223s 9.74e-05s C 2292 6 Reduce{maximum} | |
0.3% 98.7% 0.109s 2.38e-05s Py 4584 6 if{} | |
0.2% 98.8% 0.064s 3.33e-05s C 1910 5 DimShuffle{1,0} | |
0.2% 99.0% 0.062s 2.08e-05s C 3000 2 Elemwise{Composite{((i0 + i1) * GT((i0 + i1), i2))}} | |
0.2% 99.2% 0.058s 3.11e-05s C 1882 2 MaxAndArgmax | |
0.1% 99.3% 0.047s 2.47e-05s Py 1882 2 AdvancedSubtensor | |
0.1% 99.4% 0.035s 1.03e-05s C 3410 6 Sum{acc_dtype=float64} | |
0.1% 99.5% 0.027s 4.71e-06s C 5646 6 DimShuffle{x,0} | |
0.1% 99.5% 0.020s 8.57e-06s C 2292 6 Subtensor{int64} | |
0.0% 99.6% 0.018s 1.55e-05s C 1146 3 Sum{axis=[0], acc_dtype=float64} | |
0.0% 99.6% 0.015s 7.88e-06s Py 1882 2 ARange | |
0.0% 99.7% 0.014s 3.61e-05s Py 382 1 AdvancedIncSubtensor{inplace=False, set_instead_of_inc=False} | |
0.0% 99.7% 0.012s 6.53e-06s C 1882 2 Elemwise{neq,no_inplace} | |
... (remaining 22 Ops account for 0.31%(0.12s) of the runtime) | |
Apply | |
------ | |
<% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> <Gflops/s> <Apply name> | |
9.8% 9.8% 3.600s 9.42e-03s 382 110 Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense2_vel, DimShuffle{x,x}.0, if{}.0) | |
input 0: dtype=float32, shape=(1, 1), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=c | |
input 2: dtype=float32, shape=(1, 1), strides=c | |
input 3: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=c | |
9.7% 19.5% 3.554s 9.30e-03s 382 117 Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense1_vel, DimShuffle{x,x}.0, if{}.0) | |
input 0: dtype=float32, shape=(1, 1), strides=c | |
input 1: dtype=float32, shape=(784, 256), strides=c | |
input 2: dtype=float32, shape=(1, 1), strides=c | |
input 3: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(784, 256), strides=c | |
7.8% 27.3% 2.862s 7.49e-03s 382 91 Gemm{no_inplace}(W_dense2, TensorConstant{1.0}, DimShuffle{1,0}.0, Elemwise{mul}.0, TensorConstant{0.000199999994948}) | |
input 0: dtype=float32, shape=(256, 256), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(256, 20), strides=(4, 1024) | |
input 3: dtype=float32, shape=(20, 256), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=c | |
7.1% 34.4% 2.600s 6.81e-03s 382 103 Gemm{no_inplace}(W_dense1, TensorConstant{1.0}, x.T, Elemwise{mul}.0, TensorConstant{0.000199999994948}) | |
input 0: dtype=float32, shape=(784, 256), strides=c | |
input 1: dtype=float32, shape=(), strides=c | |
input 2: dtype=float32, shape=(784, 20), strides=(4, 3136) | |
input 3: dtype=float32, shape=(20, 256), strides=c | |
input 4: dtype=float32, shape=(), strides=c | |
output 0: dtype=float32, shape=(784, 256), strides=c | |
6.8% 41.2% 2.502s 6.55e-03s 382 18 Elemwise{add,no_inplace}(W_dense3, W_dense3_vel) | |
input 0: dtype=float32, shape=(256, 10), strides=c | |
input 1: dtype=float32, shape=(256, 10), strides=c | |
output 0: dtype=float32, shape=(256, 10), strides=c | |
6.5% 47.7% 2.375s 6.22e-03s 382 20 Elemwise{add,no_inplace}(W_dense2, W_dense2_vel) | |
input 0: dtype=float32, shape=(256, 256), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(256, 256), strides=c | |
6.1% 53.7% 2.223s 5.82e-03s 382 36 Elemwise{gt,no_inplace}(Elemwise{add,no_inplace}.0, TensorConstant{(1, 1) of 0}) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(1, 1), strides=c | |
output 0: dtype=int8, shape=(20, 256), strides=c | |
5.9% 59.6% 2.158s 5.65e-03s 382 97 Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
5.8% 65.4% 2.118s 5.55e-03s 382 22 Elemwise{add,no_inplace}(W_dense1, W_dense1_vel) | |
input 0: dtype=float32, shape=(784, 256), strides=c | |
input 1: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(784, 256), strides=c | |
5.7% 71.1% 2.109s 5.52e-03s 382 41 Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
4.6% 75.7% 1.674s 4.38e-03s 382 58 Elemwise{Composite{Abs((i0 * i1))}}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
4.3% 80.0% 1.569s 4.11e-03s 382 42 Elemwise{Composite{Abs((i0 * i1))}}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
4.3% 84.2% 1.569s 4.11e-03s 382 56 Elemwise{gt,no_inplace}(Elemwise{add,no_inplace}.0, TensorConstant{(1, 1) of 0}) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(1, 1), strides=c | |
output 0: dtype=int8, shape=(20, 256), strides=c | |
3.5% 87.7% 1.287s 3.37e-03s 382 57 Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
3.5% 91.3% 1.285s 3.36e-03s 382 99 Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense3_vel, DimShuffle{x,x}.0, if{}.0) | |
input 0: dtype=float32, shape=(1, 1), strides=c | |
input 1: dtype=float32, shape=(256, 10), strides=c | |
input 2: dtype=float32, shape=(1, 1), strides=c | |
input 3: dtype=float32, shape=(256, 10), strides=c | |
output 0: dtype=float32, shape=(256, 10), strides=c | |
3.0% 94.3% 1.114s 2.92e-03s 382 86 Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=int8, shape=(20, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
1.3% 95.6% 0.469s 1.23e-03s 382 10 Dot22(x, W_dense1) | |
input 0: dtype=float32, shape=(20, 784), strides=c | |
input 1: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
1.0% 96.6% 0.369s 2.46e-04s 1500 5 Dot22(x, W_dense1) | |
input 0: dtype=float32, shape=(20, 784), strides=c | |
input 1: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
0.4% 97.0% 0.159s 4.16e-04s 382 108 Reduce{maximum}(Gemm{no_inplace}.0) | |
input 0: dtype=float32, shape=(784, 256), strides=c | |
output 0: dtype=float32, shape=(), strides=c | |
0.3% 97.3% 0.120s 8.02e-05s 1500 8 Dot22(Elemwise{Composite{((i0 + i1) * GT((i0 + i1), i2))}}.0, W_dense2) | |
input 0: dtype=float32, shape=(20, 256), strides=c | |
input 1: dtype=float32, shape=(256, 256), strides=c | |
output 0: dtype=float32, shape=(20, 256), strides=c | |
... (remaining 120 Apply instances account for 2.67%(0.98s) of the runtime) | |
Memory Profile (the max between all functions in that profile) | |
(Sparse variables are ignored) | |
(For values in brackets, it's for linker = c|py | |
--- | |
Max if no gc (allow_gc=False): 5891KB (4839KB) | |
CPU: 5891KB (4839KB) | |
GPU: 0KB (0KB) | |
--- | |
Max if linker=cvm(default): 2176KB (2888KB) | |
CPU: 2176KB (2888KB) | |
GPU: 0KB (0KB) | |
--- | |
Memory saved if views are used: 0KB (0KB) | |
Memory saved if inplace ops are used: 0KB (0KB) | |
Memory saved if gc is enabled: 3714KB (1950KB) | |
--- | |
This list is based on all functions in the profile | |
<Sum apply outputs (bytes)> <Apply outputs shape> <created/inplace/view> <Apply node> | |
802816B [(784, 256)] c Elemwise{add,no_inplace}(W_dense1, W_dense1_vel) | |
802816B [(784, 256)] c Gemm{no_inplace}(W_dense1, TensorConstant{1.0}, x.T, Elemwise{mul}.0, TensorConstant{0.000199999994948}) | |
802816B [(784, 256)] c Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense1_vel, DimShuffle{x,x}.0, if{}.0) | |
802816B [(784, 256)] c if{}(Elemwise{isnan,no_inplace}.0, Alloc.0, Gemm{no_inplace}.0) | |
262144B [(256, 256)] c if{}(Elemwise{isnan,no_inplace}.0, Alloc.0, Gemm{no_inplace}.0) | |
262144B [(256, 256)] c Gemm{no_inplace}(W_dense2, TensorConstant{1.0}, DimShuffle{1,0}.0, Elemwise{mul}.0, TensorConstant{0.000199999994948}) | |
262144B [(256, 256)] c Elemwise{Composite{((i0 * i1) - (i2 * i3))}}(TensorConstant{(1, 1) of 0.9}, W_dense2_vel, DimShuffle{x,x}.0, if{}.0) | |
262144B [(256, 256)] c DimShuffle{1,0}(W_dense2) | |
262144B [(256, 256)] c Elemwise{add,no_inplace}(W_dense2, W_dense2_vel) | |
62720B [(784, 20)] c DimShuffle{1,0}(x) | |
20480B [(20, 256)] c Elemwise{Composite{((i0 + i1) * GT((i0 + i1), i2))}}(Dot22.0, DimShuffle{x,0}.0, TensorConstant{(1, 1) of 0}) | |
20480B [(20, 256)] c Elemwise{Composite{Abs((i0 * i1))}}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{Composite{Abs((i0 * i1))}}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Dot22(x, W_dense1) | |
20480B [(20, 256)] c Elemwise{mul,no_inplace}(Elemwise{add,no_inplace}.0, Elemwise{gt,no_inplace}.0) | |
20480B [(20, 256)] c Elemwise{add,no_inplace}(Dot22.0, DimShuffle{x,0}.0) | |
20480B [(256, 20)] c DimShuffle{1,0}(Elemwise{mul,no_inplace}.0) | |
20480B [(20, 256)] c Dot22(Elemwise{mul}.0, W_dense2.T) | |
20480B [(20, 256)] c Elemwise{mul}(Dot22.0, Elemwise{gt,no_inplace}.0) | |
... (remaining 120 Apply account for 253178B/5042682B ((5.02%)) of the Apply with dense outputs sizes) | |
<created/inplace/view> is taken from the Op's declaration. | |
Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment