CSC4444 Fall 2016 Quiz review

Quiz Review

Review homeworks (3)

Linear Regression

See photos. Go in the minus direction of the derivative to find the local minima.

Linear threshold functions

(linear classifier with a hard threshold)

Use a h(x) threshold function (h(x) = 0 if x<something, h(x)=1 if x>something)

The preceptron learning rule

Use a preceptron rule and a linear threshold function to represent functions of boolean variables

Note: x0 is always 1

Example: x1 AND x2 AND x3. Threshold function: w0 + x1 + x2 + x3 > 0, w0 = -2.5 will produce true only when all x1,x2,x3 are 1

For boolean valued functions, all w1-wn are weighted at 1, so one only needs to set the weight of w0 (which is dotted with x0 which is always 1) to set the threshold function properly.

For the majority function (true if the majority of variables are true), w0 = -(d/2)

If the training data D is linearly separable (we can draw a line to partition the data), then learning with a perception using the perceptron update rule is guaranteed to converge in finite steps to a perfect set of weights for D

Limitations

Cannot represent non-linearly separate functions

Sometimes not very powerful (but can be combined in chains to produce more expressive results)

Be able to create an XOR neural network

Homework 3

Don't worry about constructing a tree. Look at the #2 question

2.1 2^n. Size of state space. Each xi can be represented as a boolean variable 2.2 2^(2^n). Each function can be seen as a boolean vector of length 2^n, which is the number of subsets (2^size of set). when n=3, size = 256 2.3 3^n elements 2.4 R has larger entropy 2.5 Overfitting can be a result of noisy data, irrelevant/nonrepresentative data 2.6 Pruning can reduce overfitting with a required validation set. Pruning does not guarantee the most optimal tree. It reduces error until error increases, and the point where it stops does not guarantee to be optimal

Advantages of decision trees: Learning is efficient, robust against noisy data