Last active
December 17, 2023 16:20
-
-
Save kashaziz/cd3542f8c39c7a03532a72c2809a5396 to your computer and use it in GitHub Desktop.
Live shopping data for fraud detection
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TransactionAmount TransactionTime MerchantInfo LocationInfo UserInfo DeviceInfo | |
68.2 01/03/2023 14:30 LocalShop3 CityA JohnDoe123 Mobile | |
95.4 02/03/2023 9:45 LocalShop2 CityB AliceSmith456 Desktop | |
110.8 03/03/2023 16:15 OnlineStore1 CityC BobJohnson789 Tablet | |
45.6 04/03/2023 12:00 OnlineStore1 CityA EveWilliams123 Mobile | |
75.3 05/03/2023 10:30 OnlineStore2 CityB CharlieBrown456 Desktop |
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TransactionAmount TransactionTime MerchantInfo LocationInfo UserInfo DeviceInfo Label | |
110 04/01/2023 08:45 LocalShop2 CityC User123 Mobile 0 | |
85.7 06/01/2023 08:15 LocalShop2 CityB User456 Desktop 1 | |
80.4 08/01/2023 08:30 LocalShop2 CityA User789 Tablet 0 | |
65.2 10/01/2023 10:15 LocalShop2 CityC User123 Mobile 0 | |
55.9 12/01/2023 09:15 LocalShop2 CityB User456 Desktop 1 | |
90.8 14/01/2023 08:45 LocalShop2 CityA User789 Tablet 0 | |
115.4 16/01/2023 10:15 LocalShop2 CityC User123 Mobile 0 | |
102.3 18/01/2023 14:30 LocalShop2 CityB User456 Desktop 1 | |
110.2 19/01/2023 15:00 LocalShop2 CityA User123 Mobile 0 | |
120.1 21/01/2023 14:15 LocalShop2 CityB User456 Desktop 1 | |
82.1 23/01/2023 15:30 LocalShop2 CityC User789 Tablet 0 | |
75.2 25/01/2023 14:45 LocalShop2 CityA User123 Mobile 0 | |
68.4 27/01/2023 13:45 LocalShop2 CityB User456 Desktop 1 | |
120.8 29/01/2023 15:45 LocalShop2 CityC User789 Tablet 0 | |
53.9 31/01/2023 14:00 LocalShop2 CityA User123 Mobile 0 | |
98.6 02/02/2023 13:45 LocalShop2 CityB User456 Desktop 1 | |
87.2 04/02/2023 15:45 LocalShop2 CityC User789 Tablet 0 | |
89.6 06/02/2023 14:00 LocalShop2 CityA User123 Mobile 0 | |
54.2 08/02/2023 13:45 LocalShop2 CityB User456 Desktop 1 | |
50.2 01/01/2023 12:45 LocalShop2 CityB User456 Desktop 1 | |
120.3 02/01/2023 14:00 LocalShop2 CityA User123 Mobile 0 | |
65.8 03/01/2023 10:30 LocalShop3 CityB User456 Desktop 1 | |
120.1 05/01/2023 09:00 LocalShop3 CityA User789 Tablet 0 | |
70.9 07/01/2023 10:00 LocalShop3 CityC User123 Mobile 0 | |
100.3 09/01/2023 09:45 LocalShop3 CityB User456 Desktop 1 | |
75.3 11/01/2023 08:00 LocalShop3 CityA User789 Tablet 0 | |
110.2 13/01/2023 10:30 LocalShop3 CityC User123 Mobile 0 | |
105.3 15/01/2023 09:00 LocalShop3 CityB User456 Desktop 1 | |
95.9 17/01/2023 08:30 LocalShop3 CityA User789 Tablet 0 | |
45.7 20/01/2023 12:45 LocalShop3 CityC User789 Tablet 0 | |
55.3 22/01/2023 13:00 LocalShop3 CityA User123 Mobile 0 | |
60.5 24/01/2023 12:15 LocalShop3 CityB User456 Desktop 1 | |
50.9 26/01/2023 15:15 LocalShop3 CityC User789 Tablet 0 | |
105.1 28/01/2023 14:15 LocalShop3 CityA User123 Mobile 0 | |
63.2 30/01/2023 12:30 LocalShop3 CityB User456 Desktop 1 | |
117.4 01/02/2023 15:15 LocalShop3 CityC User789 Tablet 0 | |
114.9 03/02/2023 14:15 LocalShop3 CityA User123 Mobile 0 | |
103.2 05/02/2023 12:30 LocalShop3 CityB User456 Desktop 1 | |
121.3 07/02/2023 15:15 LocalShop3 CityC User789 Tablet 0 | |
73.9 09/02/2023 14:15 LocalShop3 CityA User123 Mobile 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
from sklearn.ensemble import RandomForestClassifier | |
from sklearn.model_selection import train_test_split | |
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report | |
from sklearn.preprocessing import OneHotEncoder | |
from sklearn.compose import ColumnTransformer | |
from sklearn.pipeline import Pipeline | |
print("-- loading training data --") | |
# Load your training dataset (replace 'training_data.csv' with your actual training dataset) | |
training_data = pd.read_csv('fraud_training_data.csv') | |
# Remove all non-empty rows | |
training_data = training_data.dropna() | |
# convert column TransactionTime to datetime format | |
training_data['TransactionTime'] = pd.to_datetime(training_data['TransactionTime'], format="%m/%d/%Y %H:%M", errors='coerce') | |
# Assume your training dataset has the specified feature columns and a label column | |
features = ['TransactionAmount', 'MerchantInfo', 'LocationInfo', 'UserInfo', 'DeviceInfo'] | |
label = 'Label' | |
# Split data into features and label | |
X = training_data[features] | |
y = training_data[label] | |
# Define columns to be one-hot encoded | |
categorical_features = ['MerchantInfo', 'LocationInfo', 'UserInfo', 'DeviceInfo'] | |
# Create a column transformer | |
preprocessor = ColumnTransformer( | |
transformers=[ | |
('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features) | |
], | |
remainder='passthrough' | |
) | |
# Create a pipeline with preprocessing and model | |
model = RandomForestClassifier() | |
pipeline = Pipeline([ | |
('preprocessor', preprocessor), | |
('model', model) | |
]) | |
# Train the model | |
pipeline.fit(X, y) | |
print("-- model trained --") | |
# Load your live data (replace 'live_data.csv' with the actual live data file) | |
print("-- loading live data --") | |
live_data = pd.read_csv('fraud_live_data.csv') | |
# Remove all non-empty rows | |
live_data = live_data.dropna() | |
# Convert 'TransactionTime' in live data to datetime format | |
training_data['TransactionTime'] = pd.to_datetime(training_data['TransactionTime'], format="%d/%m/%Y %I:%M:%S %p") | |
# Use only the specified features for live data | |
X_live = live_data[features] | |
# Make predictions on the live data | |
predictions = pipeline.predict(X_live) | |
# Add predictions to the live data | |
live_data['Prediction'] = ['Fraud' if pred == 1 else 'Not Fraud' for pred in predictions] | |
# Output the results | |
print("-- predictions made --") | |
print(live_data[['TransactionAmount', 'TransactionTime', 'MerchantInfo', 'LocationInfo', 'UserInfo', 'DeviceInfo', 'Prediction']]) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment