Skip to content

Instantly share code, notes, and snippets.

@akarve
Last active April 4, 2018 08:09
Show Gist options
  • Save akarve/4a2f1ed9289b41ddc2e051013cbec340 to your computer and use it in GitHub Desktop.
Save akarve/4a2f1ed9289b41ddc2e051013cbec340 to your computer and use it in GitHub Desktop.
Choosing the right machine learning algorithm
Adapted from [scikit learn cheat sheet](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html).
* More than 50 samples?
* No
* Get more data
* Yes
* Predicting a category?
* No
* Predicting a quantity?
* No
* Just looking?
* No
* Predicting structure?
* Tough luck
* Yes => DIMENSIONALITY REDUCTION
* Randomized PCA
* Not working?
* Less than 10K samples?
* No
* Kernel approximation
* Yes
* Isomap
* Spectral Embedding
* Not working?
* LLE
* Yes => REGRESSION
* Less than 100K samples?
* No
* SGD Regressor
* Yes
* Few features should be important?
* No
* RidgeRegression
* SVR(Kernle='linear')
* Not working?
* SVR(Kernel='rbf')
* EnsembleRegression
* Yes
* Lasso
* ElasticNet
* Yes
* Labeled data?
* No => CLUSTERING
* Number of categories known?
* No
* Less than 10K samples?
* No
* Tough Luck
* Yes
* MeanShift
* VBGMM
* Yes
* Less than 10K samlpes
* No
* MiniBatch KMeans
* Yes
* KMeans
* Not working?
* Spectral Clustering
* GMM
* Yes => CLASSIFICATION
* Less than 100K samples?
* No
* SGD classifier
* Not working?
* Kernel approximation
* Yes
* Linear SVC
* Not working?
* Text data?
* No
* KNeighbors Classifier
* Not working?
* SVC
* Ensemble classifiers
* Yes
* Naive Bayes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment