mpage

An idiosyncratic guide to teaching yourself practical machine learning, without links:

Find a binary classification dataset; maybe you have one internally.
Implement a simple decision tree algorithm, like CART.
Write some code to validate your model; produce an ROC curve and understand the tradeoff it embodies.
Compare the ROC for your training set with the ROC for a holdout and understand what it means that they differ.
Experiment with some hyperparameters: how does the comparison above change as you adjust the depth of the tree or other stopping criteria?
Combine your decision tree algorithm with bagging to produce a random forest. How does its ROC compare?
Do the same hyperparameter tuning here. (How many trees?) Reflect on overfitting and on the bias/variance tradeoff.

	(function(d) {
	var dl = d.createElement('a');
	dl.innerText = 'Download MP3';
	dl.href = "http://media.soundcloud.com/stream/"+d.querySelector('#main-content-inner img[class=waveform]').src.match(/\.com\/(.+)\_/)[1];
	dl.download = d.querySelector('em').innerText+".mp3";
	d.querySelector('.primary').appendChild(dl);
	dl.style.marginLeft = '10px';
	dl.style.color = 'red';
	dl.style.fontWeight = 700;
	})(document);