- input data (features or predictors)
- Example - student's age, gender, previous grades, etc.
- numpy array or pandas DataFrame
- denoted by the variable X.
- target data (response or labels)
- Eg = student's final grade or pass/fail status.
- numpy array or pandas DataFrame
- denoted by the variable Y.
- LogisticRegression (0 or 1):
- Small Dataset
- Fast
- Eg: whether an email is spam or not
- Decision trees and random forests (DecisionTreeClassifier, RandomForestClassifier)
- Support vector machines (SVC, LinearSVC)
- Naive Bayes (GaussianNB, BernoulliNB, MultinomialNB)
- Nearest neighbors (KNeighborsClassifier)
- Neural networks (MLPClassifier)
predicting whether a fruit is an apple or an orange based on its weight and size:
- Solution :
- If a fruit weighs less than 150 grams and has a diameter smaller than 7 cm, it's an apple.
- If a fruit weighs more than 150 grams or has a diameter larger than 7 cm, it's an orange.
Example
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 10 fruit samples with weight and size.
# The first 4 samples are apples (labeled as 0)
# while the rest are oranges (labeled as 1)
data = pd.DataFrame({
"weight": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"size": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
"class": [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
})
# Split the dataset into training and testing sets
# Predict the Find = y , input= X.
X = data[["weight", "size"]]
y = data["class"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define the logistic regression classifier and its hyperparameters
clf = LogisticRegression(C=1.0, random_state=42)
# Train the model on the training data
clf.fit(X_train, y_train)
# Evaluate the model on the testing data
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")