Last active
April 4, 2016 17:10
-
-
Save ducnh1022/17083e486df74c1509133498e11ad4f1 to your computer and use it in GitHub Desktop.
Classification week 4 overfitting in decision trees
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Learning simpler decision trees | |
early stopping -> limit depth, use classification error on limit depth of tree, if very few data points | |
Pruning: simplify tree after learning algorithm terminates | |
cost function | |
step 1: consider a split | |
step 2: computer total cost C(t) of split | |
c(t) = Error(t) + lamda * Number of leaf(t) | |
prune if total cost is lower than total cost of tree | |
in real world, data is really messy | |
3 strategies skip, imput, adapt learning algorithm to be robust to missing values | |
feature split selection |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment