Skip to content

Instantly share code, notes, and snippets.

@ducnh1022
Last active April 4, 2016 17:10
Show Gist options
  • Save ducnh1022/17083e486df74c1509133498e11ad4f1 to your computer and use it in GitHub Desktop.
Save ducnh1022/17083e486df74c1509133498e11ad4f1 to your computer and use it in GitHub Desktop.
Classification week 4 overfitting in decision trees
Learning simpler decision trees
early stopping -> limit depth, use classification error on limit depth of tree, if very few data points
Pruning: simplify tree after learning algorithm terminates
cost function
step 1: consider a split
step 2: computer total cost C(t) of split
c(t) = Error(t) + lamda * Number of leaf(t)
prune if total cost is lower than total cost of tree
in real world, data is really messy
3 strategies skip, imput, adapt learning algorithm to be robust to missing values
feature split selection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment