The advantages of using Rectified Linear Units in neural networks are
- If hard max function is used as activation function, it induces the sparsity in the hidden units.
- ReLU doesn't face gradient vanishing problem as with sigmoid and tanh function. Also, It has been shown that deep networks can be trained efficiently using ReLU even without pre-training.
- ReLU can be used in Restricted Boltzmann machine to model real/integer valued inputs.