Last active
December 3, 2023 14:02
-
-
Save kashaziz/74443bb1c2d33cbf0da5229a0d3830ad to your computer and use it in GitHub Desktop.
Using Logistic Regression to identify Customer Retention on e-commerce site
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
This Python script demonstrates the usage of logistic regression to predict whether customers will make the next purchase on an e-commerce site. | |
The code performs the following steps: | |
1. Load and Preprocess Data: | |
- Loads an e-commerce dataset containing customer features such as 'time_on_site', 'total_spent', 'is_returning_customer', and 'will_make_next_purchase'. | |
- Splits the data into training and testing sets. | |
2. Model Training: | |
- Creates a logistic regression model using scikit-learn. | |
- Trains the model on the training set, where 'will_make_next_purchase' is the target variable. | |
3. Model Evaluation: | |
- Predicts the target variable on the testing set and calculates accuracy. | |
- Displays the confusion matrix to provide a detailed view of model performance, including true positives, true negatives, false positives, and false negatives. | |
4. Making Predictions on New Data: | |
- Demonstrates how to use the trained model to make predictions on new data. | |
- Creates a new DataFrame ('new_data') with hypothetical customer features, including 'time_on_site', 'total_spent', 'is_returning_customer'. | |
- Outputs predictions for whether these new customers will make the next purchase. | |
Note: The script assumes that 'will_make_next_purchase' is a binary target variable (0 or 1) indicating whether a customer makes the next purchase. | |
Additionally, 'customer_id' has been added as a feature for prediction, considering it might contribute to purchase behavior. | |
""" | |
import pandas as pd | |
from sklearn.model_selection import train_test_split | |
from sklearn.linear_model import LogisticRegression | |
from sklearn.metrics import accuracy_score, confusion_matrix | |
# Load the dataset | |
ecommerce_data = pd.read_csv('data/shopping_data.csv') | |
# Assume 'will_make_next_purchase' is the target variable, and others are features | |
X = ecommerce_data[['customer_id', 'time_on_site', 'total_spent', 'is_returning_customer']] | |
y = ecommerce_data['will_make_next_purchase'] | |
# Split the data into training and testing sets | |
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) | |
# Create a Logistic Regression model | |
model = LogisticRegression() | |
# Train the model | |
model.fit(X_train, y_train) | |
# Make predictions on the test set | |
y_pred = model.predict(X_test) | |
# Calculate accuracy | |
accuracy = accuracy_score(y_test, y_pred) | |
# Display the confusion matrix | |
conf_matrix = confusion_matrix(y_test, y_pred) | |
# Display results | |
print(f'Training Accuracy: {model.score(X_train, y_train):.2f}') | |
print(f'Test Accuracy: {accuracy:.2f}') | |
print('Confusion Matrix:') | |
print(conf_matrix) | |
# Now, let's make predictions on new data | |
# Assuming 'new_data' is a DataFrame with columns 'customer_id', 'time_on_site', 'total_spent', 'is_returning_customer' | |
# You should replace this with your actual new data | |
new_data = pd.DataFrame({ | |
'customer_id': [9, 10, 11], | |
'time_on_site': [12, 8, 15], | |
'total_spent': [60, 30, 75], | |
'is_returning_customer': [1, 0, 1] | |
}) | |
# Make predictions on the new data | |
new_data_predictions = model.predict(new_data) | |
# Display predictions for the new data | |
new_data_with_predictions = new_data.copy() | |
new_data_with_predictions['will_make_next_purchase'] = new_data_predictions | |
print('Predictions for New Data:') | |
print(new_data_with_predictions[['customer_id', 'will_make_next_purchase']]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
customer_id | time_on_site | total_spent | is_returning_customer | will_make_next_purchase | |
---|---|---|---|---|---|
1 | 10 | 50 | 1 | 1 | |
2 | 15 | 75 | 0 | 1 | |
3 | 8 | 30 | 1 | 0 | |
4 | 20 | 100 | 1 | 1 | |
5 | 5 | 20 | 0 | 0 | |
6 | 12 | 60 | 1 | 1 | |
7 | 18 | 90 | 1 | 1 | |
8 | 7 | 35 | 0 | 0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment