Skip to content

Instantly share code, notes, and snippets.

@thomvolker
Last active April 3, 2024 15:19
Show Gist options
  • Save thomvolker/a6565f667595f2b3236dfa329e1bf5b8 to your computer and use it in GitHub Desktop.
Save thomvolker/a6565f667595f2b3236dfa329e1bf5b8 to your computer and use it in GitHub Desktop.

An iterative simple linear regression procedure

Regression is known to estimate regression coefficients conditional on the other coefficients in the model. One way to show this is using the following script, which estimates coefficients in an iterative manner. Specifically, we perform regression on the residuals that are based on all other variables that are estimated previously. To do this, we initialize a response vector $r_\text{old}$ that equals the observed outcome variable, and we initialize a vector of regression coefficients $b^{(0)}$ to a zero-vector of length $P$. Then, starting with the first variable $X_1$, we update the regression coefficient by regressing $r_\text{old}$ on $X_1$, and defining the regression coefficient as $b_1^{(t)} = b_1^{(t-1)} + \sigma_{X_1, r_\text{old}}$ and we update the residual $r_\text{new}$ as $r\text{old} - X_1 * b_1^{(t)}$. For the next variable, we apply the same procedure, until we covered all variables. After we have estimated all variables in this way, we return to the first variable, and reestimate the effect of this variable on the residuals after fitting all variables in the first round, continuing the procedure until the residuals do not change anymore.

N <- 1000
P <- 10

X <- matrix(rnorm(N*P), N, P) %*% chol(0.5 * diag(P) + 0.5)

b <- c(1,2,3,rep(0, P-3))/10
y <- X %*% b + rnorm(N, 0, 1)

fit <- lm(y ~ X)
summary(fit)
#> 
#> Call:
#> lm(formula = y ~ X)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.1325 -0.6954  0.0445  0.6796  4.5645 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.005691   0.032647   0.174 0.861655    
#> X1           0.100144   0.042236   2.371 0.017928 *  
#> X2           0.156788   0.044697   3.508 0.000472 ***
#> X3           0.289822   0.045353   6.390 2.54e-10 ***
#> X4           0.008735   0.043311   0.202 0.840212    
#> X5           0.003040   0.044345   0.069 0.945352    
#> X6           0.004814   0.042725   0.113 0.910318    
#> X7           0.004472   0.043159   0.104 0.917498    
#> X8          -0.001189   0.044082  -0.027 0.978492    
#> X9          -0.074491   0.042962  -1.734 0.083253 .  
#> X10          0.022242   0.043078   0.516 0.605750    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.027 on 989 degrees of freedom
#> Multiple R-squared:  0.158,  Adjusted R-squared:  0.1494 
#> F-statistic: 18.55 on 10 and 989 DF,  p-value: < 2.2e-16

Xs <- scale(X)
Ys <- scale(y)

rnew <- rold <- Ys
bj <- rep(0, P)

conv <- FALSE
iter <- 0

while (!conv) {
  iter <- iter + 1
  cat("\rIteration: ",iter)
  for (j in 1:P) {
    Xsj <- Xs[,j]
    fit <- lm(rold ~ Xsj-1)
    bj[j] <- bj[j] + c(fit$coefficients)
    rnew <- fit$residuals
    
    if (sum(abs(rnew - rold)) < 1e-6 | iter == 1000) conv <- TRUE
    else rold <- rnew
  }
}
#> Iteration:  1Iteration:  2Iteration:  3Iteration:  4Iteration:  5Iteration:  6Iteration:  7Iteration:  8Iteration:  9Iteration:  10Iteration:  11Iteration:  12Iteration:  13Iteration:  14Iteration:  15Iteration:  16Iteration:  17Iteration:  18Iteration:  19Iteration:  20Iteration:  21Iteration:  22Iteration:  23Iteration:  24Iteration:  25Iteration:  26Iteration:  27Iteration:  28Iteration:  29Iteration:  30Iteration:  31Iteration:  32Iteration:  33Iteration:  34Iteration:  35Iteration:  36Iteration:  37Iteration:  38Iteration:  39Iteration:  40Iteration:  41Iteration:  42Iteration:  43Iteration:  44Iteration:  45Iteration:  46Iteration:  47Iteration:  48Iteration:  49Iteration:  50Iteration:  51Iteration:  52Iteration:  53Iteration:  54Iteration:  55Iteration:  56Iteration:  57

bj
#>  [1]  0.092130998  0.138583288  0.259188894  0.007690780  0.002684614
#>  [6]  0.004403154  0.004081366 -0.001063103 -0.067899575  0.020123165
lm(Ys ~ Xs - 1)
#> 
#> Call:
#> lm(formula = Ys ~ Xs - 1)
#> 
#> Coefficients:
#>       Xs1        Xs2        Xs3        Xs4        Xs5        Xs6        Xs7  
#>  0.092131   0.138583   0.259189   0.007691   0.002685   0.004403   0.004081  
#>       Xs8        Xs9       Xs10  
#> -0.001063  -0.067900   0.020123

Created on 2024-04-03 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment