Review homeworks (3)
See photos. Go in the minus direction of the derivative to find the local minima.
(linear classifier with a hard threshold)
Use a h(x) threshold function (h(x) = 0 if x<something, h(x)=1 if x>something)
Use a preceptron rule and a linear threshold function to represent functions of boolean variables
Note: x0 is always 1
Example: x1 AND x2 AND x3
.
Threshold function: w0 + x1 + x2 + x3 > 0
, w0 = -2.5 will produce true only
when all x1,x2,x3 are 1
For boolean valued functions, all w1-wn are weighted at 1, so one only needs to set the weight of w0 (which is dotted with x0 which is always 1) to set the threshold function properly.
For the majority function (true if the majority of variables are true), w0 = -(d/2)
If the training data D is linearly separable (we can draw a line to partition the data), then learning with a perception using the perceptron update rule is guaranteed to converge in finite steps to a perfect set of weights for D
Cannot represent non-linearly separate functions
Sometimes not very powerful (but can be combined in chains to produce more expressive results)
Be able to create an XOR neural network
Don't worry about constructing a tree. Look at the #2 question
2.1 2^n. Size of state space. Each xi can be represented as a boolean variable 2.2 2^(2^n). Each function can be seen as a boolean vector of length 2^n, which is the number of subsets (2^size of set). when n=3, size = 256 2.3 3^n elements 2.4 R has larger entropy 2.5 Overfitting can be a result of noisy data, irrelevant/nonrepresentative data 2.6 Pruning can reduce overfitting with a required validation set. Pruning does not guarantee the most optimal tree. It reduces error until error increases, and the point where it stops does not guarantee to be optimal
Advantages of decision trees: Learning is efficient, robust against noisy data