Skip to content

Instantly share code, notes, and snippets.

@vijayanandrp
Last active April 27, 2017 07:05
Show Gist options
  • Save vijayanandrp/1669c4d43821f23105322d0c2d4ade89 to your computer and use it in GitHub Desktop.
Save vijayanandrp/1669c4d43821f23105322d0c2d4ade89 to your computer and use it in GitHub Desktop.
Machine_Learning via http://scikit-learn.org

Note: I have copied the contents directly from the different web sources. I give full credits to the Author. I maintian this document for my own understanding and study purpose. P.s: If in case I forgot to metion any author names kinldly let me know in comments.

1. Which features should you use to create a predictive model?

This is a difficult question that may require deep knowledge of the problem domain. But, it is possible to automatically select the right features in your data that are most useful or most relevant for the problem you are working on. This is a process called feature selection.


2. What is Feature Selection?

Feature selection is also called variable selection or attribute selection. It is the automatic selection of right attributes in your data (such as columns in tabular data) that are most relevant to the predictive modeling problem you are working on.

feature selection… is the process of selecting a subset of relevant features for use in model construction.

Feature Selection, Wikipedia entry.


Source, Credits & Thanks: An Introduction to Feature Selection by Dr. Jason Brownlee

Kindly refer to An Introduction to Feature Selection by Dr. Jason Brownlee for more knowledge regarding in Feature Selection

Feature Selection

The data features that you use to train your machine learning models 
have a huge influence on the performance you can achieve.

Feature selection is a process where you automatically select those features in your data that contribute most to the prediction variable or output in which you are interested.


Bad features:

Irrelevant or partially relevant features can negatively impact model performance.

Having irrelevant features in your data can decrease the accuracy of many models, especially linear algorithms like linear and logistic regression.

Three benefits of performing feature selection before modeling your data are:

Benifit Reason
Reduces Overfitting Less redundant data means less opportunity to make decisions based on noise.
Improves Accuracy Less misleading data means modeling accuracy improves.
Reduces Training Time Less data means that algorithms train faster.

You can learn more about feature selection with scikit-learn in the article Feature selection.

In this post you will discover automatic feature selection techniques that you can use to prepare your machine learning data in python with scikit-learn.

Feature Selection for Machine Learning

This section lists 4 feature selection recipes for machine learning in Python,

  1. Univariate Selection.
  2. Recursive Feature Elimination.
  3. Principle Component Analysis.
  4. Feature Importance.

Feature selection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment