Skip to content

Instantly share code, notes, and snippets.

@joskid
Forked from shagunsodhani/ElasticNet.md
Created March 19, 2016 04:47
Show Gist options
  • Save joskid/53cc4fcfd6ac9fcdcf66 to your computer and use it in GitHub Desktop.
Save joskid/53cc4fcfd6ac9fcdcf66 to your computer and use it in GitHub Desktop.
Notes for "Regularization and variable selection via the elastic net" paper.

Regularization and variable selection via the elastic net

Introduction to elastic net

  • Regularization and variable selection method.
  • Sparse Representation
  • Exihibits grouping effect.
  • Prticulary useful when number of predictors (p) >> number of observations (n).
  • LARS-EN algorithm to compute elastic net regularization path.
  • Link to paper.

Lasso

  • Least square method with L1-penalty on regression coefficient.
  • Does continuous shrinkage and automatic variable selection

Limitations

  • If p >> n, lasso can select at most n variables.
  • In the case of a group of variables exhibiting high pairwise correlation, lasso doesn't care about which variable is selected.
  • If n > p and there is a high correlation between predictors, ridge regression outperforms lasso.

Naive elastic net

  • Least square method.
  • Penalty on regression cofficients is a convex combination of lasso and ridge penalty.
  • penalty = (1−α)*|β| + α*|β|2 where β refers to the coefficient matrix.
  • α = 0 => lasso penalty
  • α = 1 => ridge penalty
  • Naive elastic net can be solved by transforming to lasso on augmeneted data.
  • Can be viewed as redge type shrinkage followed by lasso type thresholding.

Limitations

  • The two-stage procedure incurs double amount of shrinkage and introduces extra bias without reducing variance.

Bridge Regression

  • Generalization of lasso and ridge regression.
  • Can not produce sparse solutions.

Elastic net

  • Rescaled naive elastic net coefficients to undo shrinkage.
  • Retains good properties of the naive elastic net.

Justification for scaling

  • Elastic net becomes minimax optimal.
  • Scaling reverses the shrinkage control introduced by ridge regression.

LARS-EN

  • Based on LARS (used to solve lasso).
  • Elastic net can be transformed to lasso on augmented data so can reuse pieces of LARS algorithm.
  • Use sparseness to save on computation.

Conclusion

Elastic net performs superior to lasso.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment