Skip to content

Instantly share code, notes, and snippets.

@benjaminmgross
Last active July 25, 2023 23:45
Show Gist options
  • Save benjaminmgross/d71f161d48378d34b6970fa6d7378837 to your computer and use it in GitHub Desktop.
Save benjaminmgross/d71f161d48378d34b6970fa6d7378837 to your computer and use it in GitHub Desktop.
Predicted R-Squared (r2, r^2) Calculation in `python`
def press_statistic(y_true, y_pred, xs):
"""
Calculation of the `Press Statistics <https://www.otexts.org/1580>`_
"""
res = y_pred - y_true
hat = xs.dot(np.linalg.pinv(xs))
den = (1 - np.diagonal(hat))
sqr = np.square(res/den)
return sqr.sum()
def predicted_r2(y_true, y_pred, xs):
"""
Calculation of the `Predicted R-squared <https://rpubs.com/RatherBit/102428>`_
"""
press = press_statistic(y_true=y_true,
y_pred=y_pred,
xs=xs
)
sst = np.square( y_true - y_true.mean() ).sum()
return 1 - press / sst
def r2(y_true, y_pred):
"""
Calculation of the unadjusted r-squared, goodness of fit metric
"""
sse = np.square( y_pred - y_true ).sum()
sst = np.square( y_true - y_true.mean() ).sum()
return 1 - sse/sst
@ELHoussineT
Copy link

ELHoussineT commented Apr 19, 2022

However, you should make it clear that your code works only for models with an intercept. If you force the intercept to 0 (no intercept), your code will not work due to incorrect SST calculation:

You calculated the centered total sum of squares (SST):

sst  = np.square( y_true - y_true.mean() ).sum()

But the original R implementation calculates the uncentered total sum of squares:

+     #' Use anova() to get the sum of squares for the linear model
+     lm.anova <- anova(linear.model)
+     #' Calculate the total sum of squares
+     sst <- sum(lm.anova$'Sum Sq')

If you use the sample data in my comment above:

  • Your sst (centered) = 640.9
  • Original R implementation sst (uncentered) = 5437

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment