Jussi Jousimo statguy

thistleknot / random_feature_selection.py

Last active January 5, 2023 14:27

quick FEATURE SELECTION

	"""
	Based on Damien Benveniste, PhD 'quick Feature Selection' method

	original post: https://lnkd.in/gCDSEJcF

	quick FEATURE SELECTION
	train a Supervised Learning algorithm with a Feature Importance measure
	This is also a method that can be used for highly non-linear data as opposed to LASSO (for example) that tends to only understand linear relationships in the data. The random feature is a "Random Bar" because this is the minimum bar a feature needs to beat to be a part of the potentially useful features set. Now it doesn't mean there are not additional features that could be beneficial to further remove to optimize your model.

	This is a technique I like to perform a quick FEATURE SELECTION for Machine Learning applications. I tend to call it the "Random Bar" method! Let's assume you have a feature set X and a target Y. Let's create a random vector V (for example np.random.normal(size=(1, 100))) and append that vector as a new feature to X: