Skip to content

Instantly share code, notes, and snippets.

@j-thepac
Last active April 4, 2023 07:38
Show Gist options
  • Save j-thepac/47e0d4a89c32129b7bf10e47c3f60e58 to your computer and use it in GitHub Desktop.
Save j-thepac/47e0d4a89c32129b7bf10e47c3f60e58 to your computer and use it in GitHub Desktop.
ML / AI

ML

dataset

  • input data (features or predictors)
    • Example - student's age, gender, previous grades, etc.
    • numpy array or pandas DataFrame
    • denoted by the variable X.
  • target data (response or labels)
    • Eg = student's final grade or pass/fail status.
    • numpy array or pandas DataFrame
    • denoted by the variable Y.

Algorithm

  • LogisticRegression (0 or 1):
    • Small Dataset
    • Fast
    • Eg: whether an email is spam or not
  • Decision trees and random forests (DecisionTreeClassifier, RandomForestClassifier)
  • Support vector machines (SVC, LinearSVC)
  • Naive Bayes (GaussianNB, BernoulliNB, MultinomialNB)
  • Nearest neighbors (KNeighborsClassifier)
  • Neural networks (MLPClassifier)

Problem:

predicting whether a fruit is an apple or an orange based on its weight and size:

  • Solution :
    • If a fruit weighs less than 150 grams and has a diameter smaller than 7 cm, it's an apple.
    • If a fruit weighs more than 150 grams or has a diameter larger than 7 cm, it's an orange.

Example

    import pandas as pd
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score

    # 10 fruit samples with weight and size. 
    # The first 4 samples are apples (labeled as 0)
    # while the rest are oranges (labeled as 1)
    
    data = pd.DataFrame({
        "weight": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        "size": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
        "class": [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
    })

    # Split the dataset into training and testing sets
    # Predict the Find = y , input= X.
    
    X = data[["weight", "size"]]
    y = data["class"]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    
    # Define the logistic regression classifier and its hyperparameters
    clf = LogisticRegression(C=1.0, random_state=42)

    # Train the model on the training data
    clf.fit(X_train, y_train)

    # Evaluate the model on the testing data
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.2f}")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment