- Feature Learning
- Learning Feature Representations with K-means by Adam Coates and Andrew Y. Ng
- The devil is in the details: an evaluation of recent feature encoding methods by Chatfield et. al.
- Emergence of Object-Selective Features in Unsupervised Feature Learning by Coates, Ng
- Scaling Learning Algorithms towards AI Benjio & LeCun
- Deep Learning
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov
- Understanding the difficulty of training deep feedforward neural networks by Xavier Glorot and Yoshua Bengio
- On the difficulty of training Recurrent Neural Networks by Razvan Pascanu, Tomas Mikolov and Yoshua Bengio
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift by Sergey Ioffe and Christian Szegedy
- Deep Learning in Neural Networks: An Overview by Jurgen Schmidhuber
- Qualitatively characterizing neural network optimization problems by Ian J. Goodfellow, Oriol Vinyals
- On Recurrent and Deep Neural Networks Phd thesis of Razvan Pascanu
- Scaling Learning Algorithms towards AI by Yann LeCun and Yoshua Benjio
- Efficient Backprop by LeCun, Bottou et al
- Towards Biologically Plausible Deep Learning by Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Zhouhan Lin
- Training Recurrent Neural Networks Phd thesis of Ilya Sutskever
- A Probabilistic Theory of Deep Learning by Ankit B. Patel, Tan Nguyen, Richard G. Baraniuk
- Scalable Machine Learning
- Bring the Noise: Embracing Randomness is the Key to Scaling Up Machine Learning Algorithms by Brian Delssandro
- Large Scale Machine Learning with Stochastic Gradient Descent by Leon Bottou
- The TradeOffs of Large Scale Learning by Leon Bottou & Olivier Bousquet
- Hash Kernels for Structured Data by Qinfeng Shi et. al.
- Feature Hashing for Large Scale Multitask Learning by Weinberger et. al.
- Large-Scale Learning with Less RAM via Randomization by a group of authors from Google
- Gradient based Training
- Practical Recommendations for Gradient-Based Training of Deep Architectures by Yoshua Bengio
- Stochastic Gradient Descent Tricks by L´eon Bottou
- Non Linear Units
- Rectified Linear Units Improve Restricted Boltzmann Machines by Nair & Hinton
- Mathematical Intuition for Performance of Rectified Linear Unit in Deep Neural Networks by Alexandre Dalyec
- Interesting blog posts
- Hacker's Guide to Neural Networks by Andrej Karpathy
- Breaking Linear Classifiers on ImageNet by Andrej Karpathy
- Classifying plankton with Deep Neural Networks
- Deep stuff about deep learning?
- Understanding Convolution in Deep Learning
- A Brief Overview of Deep Learning by Ilya Sutskever
- Recurrent Neural Networks for Collaborative Filtering
- Interesting courses
- CS231n: Convolutional Neural Networks for Visual Recognition at Stanford by Andrej Karpathy
- CS224d: Deep Learning for Natural Language Processing at Stanford by Richard Socher
- STA 4273H (Winter 2015): Large Scale Machine Learning at Toronto by Russ Salakhutdinov
- AM 207 Monte Carlo Methods, Stochastic Optimization at Harvard by Verena Kaynig-Fittkau and Pavlos Protopapas
- ACL 2012 + NAACL 2013 Tutorial: Deep Learning for NLP (without Magic) at NAACL 2013 by Richard Socher, Chris Manning and Yoshua Bengio
- Video course on Deep Learning by Hugo Larochelle