This section describes each of the services compared in the throwdown and the algorithms/models used.
Decision trees, both single and bagged.
Unknown black box model(s).
Veritable API - Nonparametric Bayesian model.
10 popular algorithms (5 classification and 5 regression) were chosen to evaluate Weka. These were chosen to evaluate a wide variety of algorithms rather than to optimize performance.
Trees (J48)
Classifier: weka.classifiers.trees.J48
Parameters: -C 0.25 -M 2
Boosted Trees (Adaboost Classifier with J48 as weak learner)
Classifier: weka.classifiers.meta.AdaBoostM1
Parameters: -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Naive Bayes
Classifier: weka.classifiers.bayes.NaiveBayes
No parameters
SVM (with RBF kernel function)
Classifier: weka.classifiers.functions.LibSVM
Parameters: -S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 40.0 -C 1.0 -E 0.0010 -P 0.1
k-Nearest Neighbor (k = 3)
Classifier: weka.classifiers.lazy.IBk
Parameters: -K 3 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""
Trees (M5P)
Classifier: weka.classifiers.trees.M5P
Parameters: -M 4.0
Additive Regression (with M5P as weak classifier)
Classifier: weka.classifiers.meta.AdditiveRegression
Parameters: -S 1.0 -I 10 -W weka.classifiers.trees.M5P -- -M 4.0
Linear Regression
Classifier: weka.classifiers.functions.LinearRegression
Parameters: -S 0 -R 1.0E-8
SMOreg (support vector regression)
Classifier: weka.classifiers.functions.SMOreg
Parameters: -C 1.0 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -L 0.0010 -W 1 -P 1.0E-12 -T 0.0010 -V" -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0"
k-Nearest Neighbor (k = 3)
Classifier: weka.classifiers.lazy.IBk
Parameters: -K 3 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""
The datasets come from the UCI Machine Learning Repository and are
relatively clean by machine learning standards. They are split into two
categories, classification and regression, based on the type of the field we
are trying to predict.
The links in the tables below point to the description of the original
datasets on the UCI repository. Some of the datasets were downsampled or
modified slightly to get them in a common CSV format for our benchmarking
software. The actual data used for the throwdown can be found
here .
In a classification problem, the field we are trying to predict has one of a
finite number of possible values. Examples include predicting the type of an
iris plant or predicting whether a tumor is malignant or benign.
In a regression problem, the field we are trying to predict has a numeric
value. Examples include predicting the fuel efficiency of a car or predicting
the number of violent crimes in a community.
Dataset: Breast Cancer Wisconsin (Diagnostic) (Classification)
#
Algorithm
Accuracy
1
Prior Knowledge - Veritable - Classification
0.97
2
Weka k-Nearest Neighbor (k = 3) classifier
0.97
3
Weka Adaboost Classifier with J48 as weak learner
0.97
4
Weka Naive Bayes
0.96
5
Bagged BigML Classification Trees
0.96
6
Google Predict Classifier
0.96
7
BigML Classification Tree
0.95
8
Weka J48 Tree
0.94
9
Weka SVM (with RBF kernel function)
0.66
#
Algorithm
Macro Average F1 Score
1
Prior Knowledge - Veritable - Classification
0.96
2
Weka Adaboost Classifier with J48 as weak learner
0.95
3
Weka k-Nearest Neighbor (k = 3) classifier
0.95
4
Weka Naive Bayes
0.94
5
Bagged BigML Classification Trees
0.94
6
Google Predict Classifier
0.94
7
BigML Classification Tree
0.93
8
Weka J48 Tree
0.91
9
Weka SVM (with RBF kernel function)
0.05
#
Algorithm
Macro Average Phi Coefficient
1
Prior Knowledge - Veritable - Classification
0.94
2
Weka Adaboost Classifier with J48 as weak learner
0.92
3
Weka k-Nearest Neighbor (k = 3) classifier
0.92
4
Weka Naive Bayes
0.92
5
Bagged BigML Classification Trees
0.91
6
Google Predict Classifier
0.90
7
BigML Classification Tree
0.89
8
Weka J48 Tree
0.87
9
Weka SVM (with RBF kernel function)
0.10
Dataset: Pima Indians Diabetes (Classification)
#
Algorithm
Accuracy
1
Google Predict Classifier
0.76
2
Prior Knowledge - Veritable - Classification
0.76
3
Weka Naive Bayes
0.75
4
Bagged BigML Classification Trees
0.74
5
Weka k-Nearest Neighbor (k = 3) classifier
0.74
6
Weka J48 Tree
0.74
7
Weka Adaboost Classifier with J48 as weak learner
0.73
8
BigML Classification Tree
0.70
9
Weka SVM (with RBF kernel function)
0.65
#
Algorithm
Macro Average F1 Score
1
Bagged BigML Classification Trees
0.64
2
Weka Naive Bayes
0.62
3
Google Predict Classifier
0.61
4
Weka Adaboost Classifier with J48 as weak learner
0.59
5
Weka J48 Tree
0.59
6
Weka k-Nearest Neighbor (k = 3) classifier
0.59
7
Prior Knowledge - Veritable - Classification
0.58
8
BigML Classification Tree
0.56
9
Weka SVM (with RBF kernel function)
0.00
#
Algorithm
Macro Average Phi Coefficient
1
Google Predict Classifier
0.45
2
Weka Naive Bayes
0.44
3
Bagged BigML Classification Trees
0.44
4
Prior Knowledge - Veritable - Classification
0.44
5
Weka J48 Tree
0.41
6
Weka k-Nearest Neighbor (k = 3) classifier
0.40
7
Weka Adaboost Classifier with J48 as weak learner
0.40
8
BigML Classification Tree
0.34
9
Weka SVM (with RBF kernel function)
0.00
Dataset: Glass Identification (Classification)
#
Algorithm
Accuracy
1
Bagged BigML Classification Trees
0.99
2
Weka SVM (with RBF kernel function)
0.98
3
Weka J48 Tree
0.98
4
Weka Adaboost Classifier with J48 as weak learner
0.98
5
BigML Classification Tree
0.97
6
Google Predict Classifier
0.96
7
Prior Knowledge - Veritable - Classification
0.93
8
Weka k-Nearest Neighbor (k = 3) classifier
0.90
9
Weka Naive Bayes
0.83
#
Algorithm
Macro Average F1 Score
1
Bagged BigML Classification Trees
0.96
2
Weka SVM (with RBF kernel function)
0.95
3
Weka J48 Tree
0.94
4
Weka Adaboost Classifier with J48 as weak learner
0.94
5
BigML Classification Tree
0.93
6
Google Predict Classifier
0.89
7
Prior Knowledge - Veritable - Classification
0.83
8
Weka Naive Bayes
0.80
9
Weka k-Nearest Neighbor (k = 3) classifier
0.79
#
Algorithm
Macro Average Phi Coefficient
1
Bagged BigML Classification Trees
0.96
2
Weka SVM (with RBF kernel function)
0.95
3
Weka J48 Tree
0.94
4
Weka Adaboost Classifier with J48 as weak learner
0.94
5
BigML Classification Tree
0.93
6
Google Predict Classifier
0.89
7
Prior Knowledge - Veritable - Classification
0.83
8
Weka k-Nearest Neighbor (k = 3) classifier
0.78
9
Weka Naive Bayes
0.77
Dataset: Iris (Classification)
#
Algorithm
Accuracy
1
Google Predict Classifier
0.97
2
Weka SVM (with RBF kernel function)
0.97
3
Weka Naive Bayes
0.95
4
Prior Knowledge - Veritable - Classification
0.95
5
BigML Classification Tree
0.95
6
Bagged BigML Classification Trees
0.95
7
Weka k-Nearest Neighbor (k = 3) classifier
0.95
8
Weka J48 Tree
0.95
9
Weka Adaboost Classifier with J48 as weak learner
0.93
#
Algorithm
Macro Average F1 Score
1
Google Predict Classifier
0.97
2
Weka SVM (with RBF kernel function)
0.96
3
Bagged BigML Classification Trees
0.95
4
BigML Classification Tree
0.95
5
Weka J48 Tree
0.94
6
Weka k-Nearest Neighbor (k = 3) classifier
0.92
7
Prior Knowledge - Veritable - Classification
0.92
8
Weka Naive Bayes
0.92
9
Weka Adaboost Classifier with J48 as weak learner
0.89
#
Algorithm
Macro Average Phi Coefficient
1
Google Predict Classifier
0.96
2
Weka SVM (with RBF kernel function)
0.95
3
BigML Classification Tree
0.93
4
Bagged BigML Classification Trees
0.93
5
Weka J48 Tree
0.92
6
Prior Knowledge - Veritable - Classification
0.90
7
Weka Naive Bayes
0.90
8
Weka k-Nearest Neighbor (k = 3) classifier
0.90
9
Weka Adaboost Classifier with J48 as weak learner
0.86
Dataset: Pen-Based Recognition of Handwritten Digits (Classification)
#
Algorithm
Accuracy
1
Weka k-Nearest Neighbor (k = 3) classifier
0.99
2
Weka Adaboost Classifier with J48 as weak learner
0.99
3
Google Predict Classifier
0.98
4
Bagged BigML Classification Trees
0.98
5
BigML Classification Tree
0.97
6
Weka J48 Tree
0.96
7
Weka Naive Bayes
0.88
8
Weka SVM (with RBF kernel function)
0.10
#
Algorithm
Macro Average F1 Score
1
Weka k-Nearest Neighbor (k = 3) classifier
0.99
2
Weka Adaboost Classifier with J48 as weak learner
0.99
3
Google Predict Classifier
0.98
4
Bagged BigML Classification Trees
0.98
5
BigML Classification Tree
0.96
6
Weka J48 Tree
0.96
7
Weka Naive Bayes
0.88
8
Weka SVM (with RBF kernel function)
0.03
#
Algorithm
Macro Average Phi Coefficient
1
Weka k-Nearest Neighbor (k = 3) classifier
0.99
2
Weka Adaboost Classifier with J48 as weak learner
0.99
3
Google Predict Classifier
0.98
4
Bagged BigML Classification Trees
0.98
5
BigML Classification Tree
0.96
6
Weka J48 Tree
0.96
7
Weka Naive Bayes
0.87
8
Weka SVM (with RBF kernel function)
0.04
Dataset: Wine (Classification)
#
Algorithm
Accuracy
1
Prior Knowledge - Veritable - Classification
0.97
2
Weka Naive Bayes
0.97
3
Weka Adaboost Classifier with J48 as weak learner
0.97
4
Google Predict Classifier
0.97
5
Bagged BigML Classification Trees
0.96
6
Weka k-Nearest Neighbor (k = 3) classifier
0.96
7
Weka J48 Tree
0.95
8
BigML Classification Tree
0.92
9
Weka SVM (with RBF kernel function)
0.44
#
Algorithm
Macro Average F1 Score
1
Prior Knowledge - Veritable - Classification
0.98
2
Weka Naive Bayes
0.97
3
Google Predict Classifier
0.97
4
Weka k-Nearest Neighbor (k = 3) classifier
0.96
5
Weka Adaboost Classifier with J48 as weak learner
0.96
6
Bagged BigML Classification Trees
0.95
7
Weka J48 Tree
0.94
8
BigML Classification Tree
0.90
9
Weka SVM (with RBF kernel function)
0.27
#
Algorithm
Macro Average Phi Coefficient
1
Prior Knowledge - Veritable - Classification
0.96
2
Weka Naive Bayes
0.95
3
Google Predict Classifier
0.95
4
Weka Adaboost Classifier with J48 as weak learner
0.94
5
Weka k-Nearest Neighbor (k = 3) classifier
0.94
6
Bagged BigML Classification Trees
0.94
7
Weka J48 Tree
0.92
8
BigML Classification Tree
0.86
9
Weka SVM (with RBF kernel function)
0.13
Dataset: Abalone (Regression)
#
Algorithm
Mean Squared Error
1
Weka Additive Regression (with M5P as weak classifier)
4.55
2
Weka M5P Tree
4.55
3
Bagged BigML Regression Trees
4.71
4
Weka Linear Regression
4.91
5
Google Predict Regressor
4.92
6
Weka SMOreg (support vector regression)
5.08
7
Prior Knowledge - Veritable - Regression
5.08
8
Weka k-Nearest Neighbor (k = 3) regressor
5.61
9
BigML Regression Tree
6.97
#
Algorithm
R-Squared Score
1
Weka Additive Regression (with M5P as weak classifier)
0.56
2
Weka M5P Tree
0.56
3
Bagged BigML Regression Trees
0.54
4
Weka Linear Regression
0.52
5
Google Predict Regressor
0.52
6
Prior Knowledge - Veritable - Regression
0.51
7
Weka SMOreg (support vector regression)
0.51
8
Weka k-Nearest Neighbor (k = 3) regressor
0.46
9
BigML Regression Tree
0.32
Dataset: Auto MPG (Regression)
#
Algorithm
Mean Squared Error
1
Weka M5P Tree
7.89
2
Weka Additive Regression (with M5P as weak classifier)
7.90
3
Bagged BigML Regression Trees
8.17
4
Weka k-Nearest Neighbor (k = 3) regressor
8.88
5
Weka Linear Regression
11.53
6
Weka SMOreg (support vector regression)
12.12
7
BigML Regression Tree
13.45
8
Prior Knowledge - Veritable - Regression
14.92
9
Google Predict Regressor
85.17
#
Algorithm
R-Squared Score
1
Weka M5P Tree
0.86
2
Weka Additive Regression (with M5P as weak classifier)
0.86
3
Bagged BigML Regression Trees
0.86
4
Weka k-Nearest Neighbor (k = 3) regressor
0.85
5
Weka Linear Regression
0.80
6
Weka SMOreg (support vector regression)
0.79
7
BigML Regression Tree
0.76
8
Prior Knowledge - Veritable - Regression
0.73
9
Google Predict Regressor
-0.52
Dataset: Insurance Company Benchmark (COIL 2000) (Regression)
#
Algorithm
Mean Squared Error
1
Google Predict Regressor
0.05
2
Weka M5P Tree
0.05
3
Weka Linear Regression
0.05
4
Weka Additive Regression (with M5P as weak classifier)
0.05
5
Bagged BigML Regression Trees
0.06
6
Weka SMOreg (support vector regression)
0.06
7
Weka k-Nearest Neighbor (k = 3) regressor
0.07
8
BigML Regression Tree
0.10
#
Algorithm
R-Squared Score
1
Google Predict Regressor
0.04
2
Weka M5P Tree
0.04
3
Weka Linear Regression
0.04
4
Weka Additive Regression (with M5P as weak classifier)
0.02
5
Bagged BigML Regression Trees
-0.02
6
Weka SMOreg (support vector regression)
-0.06
7
Weka k-Nearest Neighbor (k = 3) regressor
-0.22
8
BigML Regression Tree
-0.78
Dataset: Communities and Crime (Regression)
#
Algorithm
Mean Squared Error
1
Weka M5P Tree
0.02
2
Weka SMOreg (support vector regression)
0.02
3
Weka Linear Regression
0.02
4
Weka Additive Regression (with M5P as weak classifier)
0.02
5
Bagged BigML Regression Trees
0.02
6
BigML Regression Tree
0.03
7
Weka k-Nearest Neighbor (k = 3) regressor
0.09
8
Google Predict Regressor
0.17
#
Algorithm
R-Squared Score
1
Weka SMOreg (support vector regression)
0.64
2
Weka M5P Tree
0.64
3
Weka Linear Regression
0.64
4
Weka Additive Regression (with M5P as weak classifier)
0.64
5
Bagged BigML Regression Trees
0.63
6
BigML Regression Tree
0.36
7
Weka k-Nearest Neighbor (k = 3) regressor
-0.65
8
Google Predict Regressor
-2.14
Dataset: Concrete Compressive Strength (Regression)
#
Algorithm
Mean Squared Error
1
Bagged BigML Regression Trees
31.96
2
Weka Additive Regression (with M5P as weak classifier)
37.46
3
Weka M5P Tree
38.73
4
BigML Regression Tree
48.41
5
Weka k-Nearest Neighbor (k = 3) regressor
81.20
6
Google Predict Regressor
109.04
7
Weka Linear Regression
109.27
8
Prior Knowledge - Veritable - Regression
116.80
9
Weka SMOreg (support vector regression)
119.75
#
Algorithm
R-Squared Score
1
Bagged BigML Regression Trees
0.88
2
Weka Additive Regression (with M5P as weak classifier)
0.86
3
Weka M5P Tree
0.86
4
BigML Regression Tree
0.82
5
Weka k-Nearest Neighbor (k = 3) regressor
0.71
6
Google Predict Regressor
0.61
7
Weka Linear Regression
0.60
8
Prior Knowledge - Veritable - Regression
0.58
9
Weka SMOreg (support vector regression)
0.56