-
-
Save benjaminmgross/d71f161d48378d34b6970fa6d7378837 to your computer and use it in GitHub Desktop.
def press_statistic(y_true, y_pred, xs): | |
""" | |
Calculation of the `Press Statistics <https://www.otexts.org/1580>`_ | |
""" | |
res = y_pred - y_true | |
hat = xs.dot(np.linalg.pinv(xs)) | |
den = (1 - np.diagonal(hat)) | |
sqr = np.square(res/den) | |
return sqr.sum() | |
def predicted_r2(y_true, y_pred, xs): | |
""" | |
Calculation of the `Predicted R-squared <https://rpubs.com/RatherBit/102428>`_ | |
""" | |
press = press_statistic(y_true=y_true, | |
y_pred=y_pred, | |
xs=xs | |
) | |
sst = np.square( y_true - y_true.mean() ).sum() | |
return 1 - press / sst | |
def r2(y_true, y_pred): | |
""" | |
Calculation of the unadjusted r-squared, goodness of fit metric | |
""" | |
sse = np.square( y_pred - y_true ).sum() | |
sst = np.square( y_true - y_true.mean() ).sum() | |
return 1 - sse/sst |
-
den
is just an abbreviation for "denominator" (it's calculated in the function, so it's not an input) -
xs
are the independent variable -- oftentimes referred to in a standard form, like$f(x) = y$ -
hat
is just another intermediate calculation in the process -- but a description of the often-referred to "Hat Matrix"
Please see this link for more details on the PRESS Statistic in it linear algebra form.
Hello, I'm a bit new to this.
What is the value of y_true and y_pred?
Hello, I'm a bit new to this.
What is the value of y_true and y_pred?
The value of y_true depends on the dataset you are using to test your model. The value of y_pred depends on your model prediction.
den
is just an abbreviation for "denominator" (it's calculated in the function, so it's not an input)xs
are the independent variable -- oftentimes referred to in a standard form, like$f(x) = y$ hat
is just another intermediate calculation in the process -- but a description of the often-referred to "Hat Matrix"Please see this link for more details on the PRESS Statistic in it linear algebra form.
thank you
I can see function r2 is used nowhere , my guess is we are comparing r2 with predicted_r2 (computed using PRESS), right ? please correct me if I'm getting this wrong.
Could you please give us an example to calculate predicted_r2? Many thanks
If xs
is the independent variable, it should be a 1-dimensional array, which causes an error in the line hat = xs.dot(np.linalg.pinv(xs))
that reads LinAlgError: 1-dimensional array given. Array must be at least two-dimensional
.
Am I missing something here?
Not quite sure about press_statistic
, can you link how the calculation in press_statistic
actually ends up with PRESS?
The link you shared in docstring is now invalid.
Thanks.
Okay, I compared your code:
def press_statistic(y_true, y_pred, xs):
"""
Calculation of the `Press Statistics <https://www.otexts.org/1580>`_
"""
res = y_pred - y_true
hat = xs.dot(np.linalg.pinv(xs))
den = (1 - np.diagonal(hat))
sqr = np.square(res/den)
return sqr.sum()
with the original R implementation:
PRESS <- function(linear.model) {
#' calculate the predictive residuals
pr <- residuals(linear.model)/(1-lm.influence(linear.model)$hat)
#' calculate the PRESS
PRESS <- sum(pr^2)
return(PRESS)
}
And can confirm the results are the same:
Thanks.
However, you should make it clear that your code works only for models with an intercept. If you force the intercept to 0 (no intercept), your code will not work due to incorrect SST calculation:
You calculated the centered total sum of squares (SST):
sst = np.square( y_true - y_true.mean() ).sum()
But the original R implementation calculates the uncentered total sum of squares:
+ #' Use anova() to get the sum of squares for the linear model
+ lm.anova <- anova(linear.model)
+ #' Calculate the total sum of squares
+ sst <- sum(lm.anova$'Sum Sq')
If you use the sample data in my comment above:
- Your sst (centered) = 640.9
- Original R implementation sst (uncentered) = 5437
Hi, can you explain what value is xs, hat, and den? thank you