title | tags |
---|---|
About Statistical Learning |
Machine Learning, ISL |
Refs: Chapter 2 - Introduction to Statistical Learning (ISL)
By convention:
- X: predictors, features, input variable, independent variable or sometimes just variable.
- y: label, response or dependent variable.
Assuming that there is some relationship between
- Y: Outcome
-
$f$ : Unknown function of$X$ , representing the systematic information that$X$ provides about$Y$ . - X: Predictors
-
e: Error, has mean value of zero and independent of predictors
$X$
Conclusion: In essence, Statistical Learning refers to a set of approaches for estimating
There are two main reasons to estimate
-
Prediction: We just only care about the prediction that f return. There are 2 types of errors in Prediction:
- Reducable error
- Irreducable error
-
Inference: We focus on the relationship between
$X$ and$Y$ , that is, the way$Y$ is affected when we change the predictors$X_1, X_2, ...X_N$ .
Since the error term averages to zeros, we can predict
-
$f'$ : represents the estimate of$f$ , often treated as black box, which means one is not concerned with the exact form of$f'$ , as long as it yields accurate predictions of$Y$ . -
$Y'$ : represents the resulting prediction of$Y$
The accuracy of
This is because the quantity
In some other cases, we are also interested in understanding the way that
FAQs of Inference (p.19):
-
What predictors (
$X$ ) are associated with the response ($Y$ )? Only a small fraction of available predictors are substantially associated with$Y$ . Indentiyfing the few important predictors among a large set of possible predictors is extremely useful. This relates to a technique called Feature Selection. -
What is the relationship between the response
$Y$ and each of the predictor? It could be positive relationship, in the sense that increasing the predictor is associated with increasing the value of$Y$ and vice versa. It also depends on the complexity of$f$ , the relationship between the response and a given predictor may also depend on the values of other predictors. -
Can the relationship between
$Y$ and$X$ be summarized by a linear equation? Or is it more complicated ? Historically, most methods for estimating$f$ have taken a linear form, which is reasonable in some situations. But more often, the true relationship is more complicated, in which case a linear model may not provide an enough accurate representation of the relationship between the input and output.
Depending on whether our ultimate goal is prediction or inference, or both of them, we can have different ways to estimate
Why would one want to use a more restrictive model (a simpler model) rather than a more flexible one (a more complex model)?
If we are interested in Inference, then restrictive models are much more interpretable. For instance, a linear model is a good choice since it will be quite easy to understand the relationship between
Consider a case in which we seek to develop an algorithm to predict the price of a stock, our sole requirement for the algorithm is to predict accurately as best. Interpretability in this case is not concerned. In this setting, we might expect that it would be the best to use the most flexible model that fit this problem. Surprisingly, this is not always the case. We will often obtain a more accurate predictions using a less flexible method. This phenomenon, which may seen counterintuitive at first glance, relates to a problem of Overfitting, which often occurs in highly flexible methods.
ISL - p.20