ML

dataset

input data (features or predictors)
- Example - student's age, gender, previous grades, etc.
- numpy array or pandas DataFrame
- denoted by the variable X.
target data (response or labels)
- Eg = student's final grade or pass/fail status.
- numpy array or pandas DataFrame
- denoted by the variable Y.

Algorithm

LogisticRegression (0 or 1):
- Small Dataset
- Fast
- Eg: whether an email is spam or not
Decision trees and random forests (DecisionTreeClassifier, RandomForestClassifier)
Support vector machines (SVC, LinearSVC)
Naive Bayes (GaussianNB, BernoulliNB, MultinomialNB)
Nearest neighbors (KNeighborsClassifier)
Neural networks (MLPClassifier)

Problem:

predicting whether a fruit is an apple or an orange based on its weight and size:

Solution :
- If a fruit weighs less than 150 grams and has a diameter smaller than 7 cm, it's an apple.
- If a fruit weighs more than 150 grams or has a diameter larger than 7 cm, it's an orange.

Example

    import pandas as pd
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score

    # 10 fruit samples with weight and size. 
    # The first 4 samples are apples (labeled as 0)
    # while the rest are oranges (labeled as 1)
    
    data = pd.DataFrame({
        "weight": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        "size": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
        "class": [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
    })

    # Split the dataset into training and testing sets
    # Predict the Find = y , input= X.
    
    X = data[["weight", "size"]]
    y = data["class"]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    
    # Define the logistic regression classifier and its hyperparameters
    clf = LogisticRegression(C=1.0, random_state=42)

    # Train the model on the training data
    clf.fit(X_train, y_train)

    # Evaluate the model on the testing data
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.2f}")

j-thepac/scikit-learn.md

ML

dataset

Algorithm

Problem: