- Feature Learning
- Learning Feature Representations with K-means by Adam Coates and Andrew Y. Ng
- The devil is in the details: an evaluation of recent feature encoding methods by Chatfield et. al.
- Emergence of Object-Selective Features in Unsupervised Feature Learning by Coates, Ng
- Scaling Learning Algorithms towards AI Benjio & LeCun
- A Theory of Feature Learning by Brendan van Rooyen, Robert C. Williamson
- Deep Learning
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov
- Understanding the difficulty of training deep feedforward neural networks by Xavier Glorot and Yoshua Bengio
- On the difficulty of training Recurrent Neural Networks by Razvan Pascanu, Tomas Mikolov and Yoshua Bengio
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift by Sergey Ioffe and Christian Szegedy
- Deep Learning in Neural Networks: An Overview by Jurgen Schmidhuber
- Qualitatively characterizing neural network optimization problems by Ian J. Goodfellow, Oriol Vinyals
- On Recurrent and Deep Neural Networks Phd thesis of Razvan Pascanu
- Scaling Learning Algorithms towards AI by Yann LeCun and Yoshua Benjio
- Efficient Backprop by LeCun, Bottou et al
- Towards Biologically Plausible Deep Learning by Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Zhouhan Lin
- Training Recurrent Neural Networks Phd thesis of Ilya Sutskever
- A Probabilistic Theory of Deep Learning by Ankit B. Patel, Tan Nguyen, Richard G. Baraniuk
- ImageNet Classification with Deep Convolutional Neural Networks by Krizhevsky, Sutskever and Hinton
- Text Understanding from Scratch by Xiang Zhang, Yann LeCun
- Learning Deep Architectures for AI by Yoshua Bengio
- Deep Learning - a review article that appeared in Nature by Yann LeCun, Yoshua Bengio & Geoffrey Hinton
- Scalable Machine Learning
- Bring the Noise: Embracing Randomness is the Key to Scaling Up Machine Learning Algorithms by Brian Delssandro
- Large Scale Machine Learning with Stochastic Gradient Descent by Leon Bottou
- The TradeOffs of Large Scale Learning by Leon Bottou & Olivier Bousquet
- Hash Kernels for Structured Data by Qinfeng Shi et. al.
- Feature Hashing for Large Scale Multitask Learning by Weinberger et. al.
- Large-Scale Learning with Less RAM via Randomization by a group of authors from Google
- Collaborative Email-Spam Filtering with the Hashing-Trick by Joshua Attenberg et. al.
- Gradient based Training
- Practical Recommendations for Gradient-Based Training of Deep Architectures by Yoshua Bengio
- Stochastic Gradient Descent Tricks by L´eon Bottou
- Non Linear Units
- Rectified Linear Units Improve Restricted Boltzmann Machines by Nair & Hinton
- Mathematical Intuition for Performance of Rectified Linear Unit in Deep Neural Networks by Alexandre Dalyec
- Interesting blog posts and presentations
- Hacker's Guide to Neural Networks by Andrej Karpathy
- Breaking Linear Classifiers on ImageNet by Andrej Karpathy
- Classifying plankton with Deep Neural Networks
- Deep stuff about deep learning?
- Understanding Convolution in Deep Learning
- A Brief Overview of Deep Learning by Ilya Sutskever
- Recurrent Neural Networks for Collaborative Filtering
- Deep Belief Networks vs Convolutional Neural Networks
- Deep Learning vs Probabilistic Graphical Models vs Logic
- Extracting Structured Data From Recipes Using Conditional Random Fields
- Ten Lessons Learned from Building (real-life impactful) Machine Learning Systems
- Scalable Machine Learning by Mikio L Braun
- The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy
- Initialization of deep networks
- Weak Learning, Boosting, and the AdaBoost algorithm
- Probably Approximately Correct — a Formal Theory of Learning
- Making sense of principal component analysis, eigenvectors & eigenvalues
- Interesting courses and tutorials
- CS231n: Convolutional Neural Networks for Visual Recognition at Stanford by Andrej Karpathy
- CS224d: Deep Learning for Natural Language Processing at Stanford by Richard Socher
- STA 4273H (Winter 2015): Large Scale Machine Learning at Toronto by Russ Salakhutdinov
- AM 207 Monte Carlo Methods, Stochastic Optimization at Harvard by Verena Kaynig-Fittkau and Pavlos Protopapas
- ACL 2012 + NAACL 2013 Tutorial: Deep Learning for NLP (without Magic) at NAACL 2013 by Richard Socher, Chris Manning and Yoshua Bengio
- Video course on Deep Learning by Hugo Larochelle
- ECCV-2010 Tutorial: Feature Learning for Image Classification by Kai Yu & Andrew Ng
- Kdd 2014 Tutorial - the recommender problem revisited by Xavier Amatriain
- Machine Learning 2014-15 at Oxford University
- Course Notes - Stanford Machine Learning by Andrew Ng
- A Tutorial on Principal Components Analysis by Lindsay I Smith
- General
- Distilling the Knowledge in a Neural Network by Geoffrey Hinton, Oriol Vinyals, Jeff Dean
- A Random Forest Guided Tour by Biau & Scornet
- MCMC
- Markov Chain Monte Carlo Without all the Bullshit
- How would you explain Markov Chain Monte Carlo (MCMC) to a layperson?
- iTunes The Data Skeptic Podcast
- An Introduction to MCMC for Machine Learning by Christophe Andrieu, Nando De Freitas, Arnaud Daucet and Michael Jordan
- The Markov Chain Monte Carlo Revolution by Persi Diaconis
- Conditional Random Fields
- Log-linear Models and Conditional Random Fields by Charles Elkan (video)
- Log-linear models and conditional random fields - notes by Charles Elkan