joskid/ElasticNet.md

Forked from shagunsodhani/ElasticNet.md

Created March 19, 2016 04:47

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/joskid/53cc4fcfd6ac9fcdcf66.js"></script>
Save joskid/53cc4fcfd6ac9fcdcf66 to your computer and use it in GitHub Desktop.

Download ZIP

Notes for "Regularization and variable selection via the elastic net" paper.

Raw

ElasticNet.md

Regularization and variable selection via the elastic net

Introduction to elastic net

Regularization and variable selection method.
Sparse Representation
Exihibits grouping effect.
Prticulary useful when number of predictors (p) >> number of observations (n).
LARS-EN algorithm to compute elastic net regularization path.
Link to paper.

Lasso

Least square method with L1-penalty on regression coefficient.
Does continuous shrinkage and automatic variable selection

Limitations

If p >> n, lasso can select at most n variables.
In the case of a group of variables exhibiting high pairwise correlation, lasso doesn't care about which variable is selected.
If n > p and there is a high correlation between predictors, ridge regression outperforms lasso.

Naive elastic net

Least square method.
Penalty on regression cofficients is a convex combination of lasso and ridge penalty.
penalty = (1−α)*|β| + α*|β|² where β refers to the coefficient matrix.
α = 0 => lasso penalty
α = 1 => ridge penalty
Naive elastic net can be solved by transforming to lasso on augmeneted data.
Can be viewed as redge type shrinkage followed by lasso type thresholding.

Limitations

The two-stage procedure incurs double amount of shrinkage and introduces extra bias without reducing variance.

Bridge Regression

Generalization of lasso and ridge regression.
Can not produce sparse solutions.

Elastic net

Rescaled naive elastic net coefficients to undo shrinkage.
Retains good properties of the naive elastic net.

Justification for scaling

Elastic net becomes minimax optimal.
Scaling reverses the shrinkage control introduced by ridge regression.

LARS-EN

Based on LARS (used to solve lasso).
Elastic net can be transformed to lasso on augmented data so can reuse pieces of LARS algorithm.
Use sparseness to save on computation.

Conclusion

Elastic net performs superior to lasso.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment