Note: I have copied the contents directly from the different web sources. I give full credits to the Author. I maintian this document for my own understanding and study purpose. P.s: If in case I forgot to metion any author names kinldly let me know in comments.
-
-
Save vijayanandrp/1669c4d43821f23105322d0c2d4ade89 to your computer and use it in GitHub Desktop.
This is a difficult question that may require deep knowledge of the problem domain.
But, it is possible to automatically select the right features in your data that are most useful or
most relevant for the problem you are working on. This is a process called feature selection
.
Feature selection is also called variable selection
or attribute selection
.
It is the automatic selection of right attributes in your data (such as columns in tabular data)
that are most relevant to the predictive modeling problem you are working on.
feature selection… is the process of selecting a subset of relevant features for use in model construction.
— Feature Selection, Wikipedia entry.
Source, Credits & Thanks: An Introduction to Feature Selection by Dr. Jason Brownlee
Kindly refer to An Introduction to Feature Selection by Dr. Jason Brownlee for more knowledge regarding in Feature Selection
The data features that you use to train your machine learning models
have a huge influence on the performance you can achieve.
Feature selection is a process where you automatically select those features in your data that contribute most to the prediction variable or output in which you are interested.
Bad features:
Irrelevant or partially relevant features can negatively impact model performance.
Having irrelevant features in your data can decrease the accuracy of many models, especially linear algorithms like linear and logistic regression.
Benifit | Reason |
---|---|
Reduces Overfitting | Less redundant data means less opportunity to make decisions based on noise. |
Improves Accuracy | Less misleading data means modeling accuracy improves. |
Reduces Training Time | Less data means that algorithms train faster. |
You can learn more about feature selection with scikit-learn in the article Feature selection.
In this post you will discover automatic feature selection techniques that you can use to
prepare your machine learning data in python with scikit-learn
.
This section lists 4 feature selection recipes for machine learning in Python,
- Univariate Selection.
- Recursive Feature Elimination.
- Principle Component Analysis.
- Feature Importance.