vladiant · February 28, 2022 18:56
diff --git a/compare_ml.py b/compare_ml.py
 # Compare Algorithms
 import pandas
 import matplotlib.pyplot as plt
 from sklearn import model_selection
 from sklearn.linear_model import LogisticRegression
 from sklearn.tree import DecisionTreeClassifier
 from sklearn.neighbors import KNeighborsClassifier
 from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
 from sklearn.naive_bayes import GaussianNB
 from sklearn.svm import SVC
 # load dataset
 url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
 names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
 dataframe = pandas.read_csv(url, names=names)
 array = dataframe.values
 X = array[:,0:8]
 Y = array[:,8]
 # prepare configuration for cross validation test harness
 seed = 7
 # prepare models
 models = []
 models.append(('LR', LogisticRegression(max_iter=1000)))
 models.append(('LDA', LinearDiscriminantAnalysis()))
 models.append(('KNN', KNeighborsClassifier()))
 models.append(('CART', DecisionTreeClassifier()))
 models.append(('NB', GaussianNB()))
 models.append(('SVM', SVC()))
 # evaluate each model in turn
 results = []
 names = []
 scoring = 'accuracy'
 for name, model in models:
 	kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)
 	cv_results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
 	results.append(cv_results)
 	names.append(name)
 	msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
 	print(msg)
 # boxplot algorithm comparison
 fig = plt.figure()
 fig.suptitle('Algorithm Comparison')
 ax = fig.add_subplot(111)
 plt.boxplot(results)
 ax.set_xticklabels(names)
 plt.show()
diff --git a/compare_ml.txt b/compare_ml.txt
 How To Compare Machine Learning Algorithms in Python with scikit-learn
 By Jason Brownlee on June 1, 2016 in Python Machine Learning 

 Choose The Best Machine Learning Model
 How do you choose the best model for your problem?

 When you work on a machine learning project, you often end up with multiple good models to choose from. Each model will have different performance characteristics.

 Using resampling methods like cross validation, you can get an estimate for how accurate each model may be on unseen data. You need to be able to use these estimates to choose one or two best models from the suite of models that you have created.

 Compare Machine Learning Models Carefully
 When you have a new dataset, it is a good idea to visualize the data using different techniques in order to look at the data from different perspectives.

 The same idea applies to model selection. You should use a number of different ways of looking at the estimated accuracy of your machine learning algorithms in order to choose the one or two to finalize.

 A way to do this is to use different visualization methods to show the average accuracy, variance and other properties of the distribution of model accuracies.

 In the next section you will discover exactly how you can do that in Python with scikit-learn.

 Compare Machine Learning Algorithms Consistently
 The key to a fair comparison of machine learning algorithms is ensuring that each algorithm is evaluated in the same way on the same data.

 You can achieve this by forcing each algorithm to be evaluated on a consistent test harness.

 In the example below 6 different algorithms are compared:
    1.     Logistic Regression
    2.     Linear Discriminant Analysis
    3.     K-Nearest Neighbors
    4.     Classification and Regression Trees
    5.     Naive Bayes
    6.     Support Vector Machines

 The problem is a standard binary classification dataset from the UCI machine learning repository called the Pima Indians onset of diabetes problem (update: download from here https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv ). The problem has two classes and eight numeric input variables of varying scales.

 The 10-fold cross validation procedure is used to evaluate each algorithm, importantly configured with the same random seed to ensure that the same splits to the training data are performed and that each algorithms is evaluated in precisely the same way.

 Each algorithm is given a short name, useful for summarizing results afterward.

 Running the example provides a list of each algorithm short name, the mean accuracy and the standard deviation accuracy.

 LR: 0.769515 (0.048411)
 LDA: 0.773462 (0.051592)
 KNN: 0.726555 (0.061821)
 CART: 0.695232 (0.062517)
 NB: 0.755178 (0.042766)
 SVM: 0.651025 (0.072141)

 From these results, it would suggest that both logistic regression and linear discriminate analysis are perhaps worthy of further study on this problem.

 Summary
 In this post you discovered how to evaluate multiple different machine learning algorithms on a dataset in Python with scikit-learn.

 You learned how to both use the same test harness to evaluate the algorithms and how to summarize the results both numerically and using a box and whisker plot.

 You can use this recipe as a template for evaluating multiple algorithms on your own problems.

 Do you have any questions about evaluating machine learning algorithms in Python or about this post? Ask your questions in the comments below and I will do my best to answer them.
	# Compare Algorithms
	import pandas
	import matplotlib.pyplot as plt
	from sklearn import model_selection
	from sklearn.linear_model import LogisticRegression
	from sklearn.tree import DecisionTreeClassifier
	from sklearn.neighbors import KNeighborsClassifier
	from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
	from sklearn.naive_bayes import GaussianNB
	from sklearn.svm import SVC
	# load dataset
	url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
	names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
	dataframe = pandas.read_csv(url, names=names)
	array = dataframe.values
	X = array[:,0:8]
	Y = array[:,8]
	# prepare configuration for cross validation test harness
	seed = 7
	# prepare models
	models = []
	models.append(('LR', LogisticRegression(max_iter=1000)))
	models.append(('LDA', LinearDiscriminantAnalysis()))
	models.append(('KNN', KNeighborsClassifier()))
	models.append(('CART', DecisionTreeClassifier()))
	models.append(('NB', GaussianNB()))
	models.append(('SVM', SVC()))
	# evaluate each model in turn
	results = []
	names = []
	scoring = 'accuracy'
	for name, model in models:
	kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)
	cv_results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
	results.append(cv_results)
	names.append(name)
	msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
	print(msg)
	# boxplot algorithm comparison
	fig = plt.figure()
	fig.suptitle('Algorithm Comparison')
	ax = fig.add_subplot(111)
	plt.boxplot(results)
	ax.set_xticklabels(names)
	plt.show()
	How To Compare Machine Learning Algorithms in Python with scikit-learn
	By Jason Brownlee on June 1, 2016 in Python Machine Learning

	Choose The Best Machine Learning Model
	How do you choose the best model for your problem?

	When you work on a machine learning project, you often end up with multiple good models to choose from. Each model will have different performance characteristics.

	Using resampling methods like cross validation, you can get an estimate for how accurate each model may be on unseen data. You need to be able to use these estimates to choose one or two best models from the suite of models that you have created.

	Compare Machine Learning Models Carefully
	When you have a new dataset, it is a good idea to visualize the data using different techniques in order to look at the data from different perspectives.

	The same idea applies to model selection. You should use a number of different ways of looking at the estimated accuracy of your machine learning algorithms in order to choose the one or two to finalize.

	A way to do this is to use different visualization methods to show the average accuracy, variance and other properties of the distribution of model accuracies.

	In the next section you will discover exactly how you can do that in Python with scikit-learn.

	Compare Machine Learning Algorithms Consistently
	The key to a fair comparison of machine learning algorithms is ensuring that each algorithm is evaluated in the same way on the same data.

	You can achieve this by forcing each algorithm to be evaluated on a consistent test harness.

	In the example below 6 different algorithms are compared:
	1. Logistic Regression
	2. Linear Discriminant Analysis
	3. K-Nearest Neighbors
	4. Classification and Regression Trees
	5. Naive Bayes
	6. Support Vector Machines

	The problem is a standard binary classification dataset from the UCI machine learning repository called the Pima Indians onset of diabetes problem (update: download from here https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv ). The problem has two classes and eight numeric input variables of varying scales.

	The 10-fold cross validation procedure is used to evaluate each algorithm, importantly configured with the same random seed to ensure that the same splits to the training data are performed and that each algorithms is evaluated in precisely the same way.

	Each algorithm is given a short name, useful for summarizing results afterward.

	Running the example provides a list of each algorithm short name, the mean accuracy and the standard deviation accuracy.

	LR: 0.769515 (0.048411)
	LDA: 0.773462 (0.051592)
	KNN: 0.726555 (0.061821)
	CART: 0.695232 (0.062517)
	NB: 0.755178 (0.042766)
	SVM: 0.651025 (0.072141)

	From these results, it would suggest that both logistic regression and linear discriminate analysis are perhaps worthy of further study on this problem.

	Summary
	In this post you discovered how to evaluate multiple different machine learning algorithms on a dataset in Python with scikit-learn.

	You learned how to both use the same test harness to evaluate the algorithms and how to summarize the results both numerically and using a box and whisker plot.

	You can use this recipe as a template for evaluating multiple algorithms on your own problems.

	Do you have any questions about evaluating machine learning algorithms in Python or about this post? Ask your questions in the comments below and I will do my best to answer them.