Machine Learning Week 1 Quiz 2 (Linear Regression with One Variable) Stanford Coursera

Github repo for the Course: Stanford Machine Learning (Coursera)

Question 1

Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year.

Specifically, let x be equal to the number of "A" grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of "A" grades they get in their second year (sophomore year).

Here each row is one training example. Recall that in linear regression, our hypothesis is h_θ(x)=θ₀+θ₁x, and we use m to denote the number of training examples.

x	y
5	4
3	4
0	1
4	3

For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).

Answer:
4

Question 2

Consider the following training set of m=4 training examples:

x	y
1	0.5
2	1
4	2
0	0

Consider the linear regression model h_θ(x)=θ₀+θ₁x. What are the values of θ₀ and θ₁ that you would expect to obtain upon running gradient descent on this model? (Linear regression will be able to fit this data perfectly.)

θ₀=0.5,θ₁=0
θ₀=0.5,θ₁=0.5
θ₀=1,θ₁=0.5
θ₀=0,θ₁=0.5
θ₀=1,θ₁=1

Answer:
θ₀=0,θ₁=0.5

As J(θ₀,θ₁)=0, y = h_θ(x) = θ₀ + θ₁x. Using any two values in the table, solve for θ₀, θ₁.

If you don't know how to do this, please see the following video: Solving system of linear equations

Question 3

Suppose we set θ₀=−1,θ₁=0.5. What is h_θ(4)?

Answer:

Setting x = 4, we have h_θ(x)=θ₀+θ₁x = -1 + (0.5)(4) = 1

Question 4

Let f be some function so that

f(θ₀,θ₁) outputs a number. For this problem,

f is some arbitrary/unknown smooth function (not necessarily the

cost function of linear regression, so f may have local optima).

Suppose we use gradient descent to try to minimize f(θ₀,θ₁) as a function of θ₀ and θ₁. Which of the

following statements are true? (Check all that apply.)

Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ₀,θ₁).
If the learning rate is too small, then gradient descent may take a very long time to converge.
If θ₀ and θ₁ are initialized at a local minimum, then one iteration will not change their values.
If θ₀ and θ₁ are initialized so that θ₀=θ₁, then by symmetry (because we do simultaneous updates to the two parameters), after one iteration of gradient descent, we will still have θ₀=θ₁.

Answers:

True or False	Statement	Explanation
True	If the learning rate is too small, then gradient descent may take a very long time to converge.	If the learning rate is small, gradient descent ends up taking an extremely small step on each iteration, and therefor can take a long time to converge
True	If θ₀ and θ₁ are initialized at a local minimum, then one iteration will not change their values.	At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the parameters.
False	Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ₀,θ₁).	If the learning rate is too large, one step of gradient descent can actually vastly "overshoot" and actually increase the value of f(θ₀,θ₁).
False	If θ₀ and θ₁ are initialized so that θ₀=θ₁, then by symmetry (because we do simultaneous updates to the two parameters), after one iteration of gradient descent, we will still have θ₀=θ₁.	The updates to θ₀ and θ₁ are different (even though we're doing simulaneous updates), so there's no particular reason to update them to be same after one iteration of gradient descent.

Other Options:

True or False	Statement	Explanation
True	If the first few iterations of gradient descent cause f(θ₀,θ₁) to increase rather than decrease, then the most likely cause is that we have set the learning rate to too large a value	if alpha were small enough, then gradient descent should always successfully take a tiny small downhill and decrease f(θ₀,θ₁) at least a little bit. If gradient descent instead increases the objective value, that means alpha is too large (or you have a bug in your code!).
False	No matter how θ₀ and θ₁ are initialized, so long as learning rate is sufficiently small, we can safely expect gradient descent to converge to the same solution	This is not true, depending on the initial condition, gradient descent may end up at different local optima.
False	Setting the learning rate to be very small is not harmful, and can only speed up the convergence of gradient descent.	If the learning rate is small, gradient descent ends up taking an extremely small step on each iteration, so this would actually slow down (rather than speed up) the convergence of the algorithm.

Question 5

Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we have some training set, and for our training set we managed to find some θ₀, θ₁ such that J(θ₀,θ₁)=0.

Which of the statements below must then be true? (Check all that apply.)

For this to be true, we must have y⁽ⁱ⁾=0 for every value of i=1,2,…,m.
Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.
For this to be true, we must have θ₀=0 and θ₁=0 so that h_θ(x)=0
Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line.

Answers:

True or False	Statement	Explanation
False	For this to be true, we must have y⁽ⁱ⁾=0 for every value of i=1,2,…,m.	So long as all of our training examples lie on a straight line, we will be able to find θ₀ and θ₁) so that J(θ₀,θ₁)=0. It is not necessary that y⁽ⁱ⁾ for all our examples.
False	Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.	none
False	For this to be true, we must have θ₀=0 and θ₁=0 so that h_θ(x)=0	If J(θ₀,θ₁)=0 that means the line defined by the equation "y = θ₀ + θ₁x" perfectly fits all of our data. There's no particular reason to expect that the values of θ₀ and θ₁ that achieve this are both 0 (unless y⁽ⁱ⁾=0 for all of our training examples).
True	Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line.	None

Other Options:

True or False	Statement	Explanation
False	We can perfectly predict the value of y even for new examples that we have not yet seen. (e.g., we can perfectly predict prices of even new houses that we have not yet seen.)	None
False	This is not possible: By the definition of J(θ₀,θ₁), it is not possible for there to exist θ₀ and θ₁ so that J(θ₀,θ₁)=0	None
True	For these values of θ₀ and θ₁ that satisfy J(θ₀,θ₁)=0, we have that h_θ(x⁽ⁱ⁾)=y⁽ⁱ⁾ for every training example (x⁽ⁱ⁾,y⁽ⁱ⁾)	None

Can anyone please explain to me Q-5 others - the last statement, how it's true? "For these values of θ0 and θ1 that satisfy J(θ0,θ1)=0, we have that hθ(x(i))=y(i) for every training example (x(i),y(i))"

hθ(x(i))=y(i) can't be the same for all the training examples right. Because some of the points will not fall on the line hθ(x(i)).

I didn't get your question exactly, but I'm gonna discuss it from two points of view:
First, I think we can say it is because the question is 'supposing' that h(x(i))=y(i). In a real-world situation this will probably never happen but here we have a hypothetical situation. So if we 'imagine' that we have found the perfect theta0 and theta1 so that our predictions are 'exactly' the same as the actual value, then it mean that our line perfectly fits all the points.

Second, if you thought that the prediction and actual value are gonna be the same 'shared' value for all data points, then this is not true. For example, the question doesn't mean that h(x(i))=y(i)=4 for all data points. It just means that whatever unique value h(x(i)) has, is the same as y(i) 'for that unique data point'. So for one data point we can have 5=5 and for the other 4=4.

I hope I captured what you meant.

mGalarnyk/machineLearningWeek1Quiz2.md

Machine Learning Week 1 Quiz 2 (Linear Regression with One Variable) Stanford Coursera

Question 1

Question 2

Question 3

Question 4

Question 5

NassimF commented Nov 20, 2021

RCL23 commented Jan 28, 2022

H1manshus0ni commented Feb 5, 2022

phongvu009 commented Apr 11, 2022 •

edited

Loading

rrangraj commented Mar 2, 2024

Fatimatrook commented Oct 7, 2024

mGalarnyk/machineLearningWeek1Quiz2.md

Machine Learning Week 1 Quiz 2 (Linear Regression with One Variable) Stanford Coursera

Question 1

Question 2

Question 3

Question 4

Question 5

NassimF commented Nov 20, 2021

RCL23 commented Jan 28, 2022

H1manshus0ni commented Feb 5, 2022

phongvu009 commented Apr 11, 2022 • edited Loading

rrangraj commented Mar 2, 2024

Fatimatrook commented Oct 7, 2024

phongvu009 commented Apr 11, 2022 •

edited

Loading