Skip to content

Instantly share code, notes, and snippets.

@kashaziz
Last active December 17, 2023 16:20
Show Gist options
  • Save kashaziz/cd3542f8c39c7a03532a72c2809a5396 to your computer and use it in GitHub Desktop.
Save kashaziz/cd3542f8c39c7a03532a72c2809a5396 to your computer and use it in GitHub Desktop.
Live shopping data for fraud detection
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
TransactionAmount TransactionTime MerchantInfo LocationInfo UserInfo DeviceInfo
68.2 01/03/2023 14:30 LocalShop3 CityA JohnDoe123 Mobile
95.4 02/03/2023 9:45 LocalShop2 CityB AliceSmith456 Desktop
110.8 03/03/2023 16:15 OnlineStore1 CityC BobJohnson789 Tablet
45.6 04/03/2023 12:00 OnlineStore1 CityA EveWilliams123 Mobile
75.3 05/03/2023 10:30 OnlineStore2 CityB CharlieBrown456 Desktop
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
TransactionAmount TransactionTime MerchantInfo LocationInfo UserInfo DeviceInfo Label
110 04/01/2023 08:45 LocalShop2 CityC User123 Mobile 0
85.7 06/01/2023 08:15 LocalShop2 CityB User456 Desktop 1
80.4 08/01/2023 08:30 LocalShop2 CityA User789 Tablet 0
65.2 10/01/2023 10:15 LocalShop2 CityC User123 Mobile 0
55.9 12/01/2023 09:15 LocalShop2 CityB User456 Desktop 1
90.8 14/01/2023 08:45 LocalShop2 CityA User789 Tablet 0
115.4 16/01/2023 10:15 LocalShop2 CityC User123 Mobile 0
102.3 18/01/2023 14:30 LocalShop2 CityB User456 Desktop 1
110.2 19/01/2023 15:00 LocalShop2 CityA User123 Mobile 0
120.1 21/01/2023 14:15 LocalShop2 CityB User456 Desktop 1
82.1 23/01/2023 15:30 LocalShop2 CityC User789 Tablet 0
75.2 25/01/2023 14:45 LocalShop2 CityA User123 Mobile 0
68.4 27/01/2023 13:45 LocalShop2 CityB User456 Desktop 1
120.8 29/01/2023 15:45 LocalShop2 CityC User789 Tablet 0
53.9 31/01/2023 14:00 LocalShop2 CityA User123 Mobile 0
98.6 02/02/2023 13:45 LocalShop2 CityB User456 Desktop 1
87.2 04/02/2023 15:45 LocalShop2 CityC User789 Tablet 0
89.6 06/02/2023 14:00 LocalShop2 CityA User123 Mobile 0
54.2 08/02/2023 13:45 LocalShop2 CityB User456 Desktop 1
50.2 01/01/2023 12:45 LocalShop2 CityB User456 Desktop 1
120.3 02/01/2023 14:00 LocalShop2 CityA User123 Mobile 0
65.8 03/01/2023 10:30 LocalShop3 CityB User456 Desktop 1
120.1 05/01/2023 09:00 LocalShop3 CityA User789 Tablet 0
70.9 07/01/2023 10:00 LocalShop3 CityC User123 Mobile 0
100.3 09/01/2023 09:45 LocalShop3 CityB User456 Desktop 1
75.3 11/01/2023 08:00 LocalShop3 CityA User789 Tablet 0
110.2 13/01/2023 10:30 LocalShop3 CityC User123 Mobile 0
105.3 15/01/2023 09:00 LocalShop3 CityB User456 Desktop 1
95.9 17/01/2023 08:30 LocalShop3 CityA User789 Tablet 0
45.7 20/01/2023 12:45 LocalShop3 CityC User789 Tablet 0
55.3 22/01/2023 13:00 LocalShop3 CityA User123 Mobile 0
60.5 24/01/2023 12:15 LocalShop3 CityB User456 Desktop 1
50.9 26/01/2023 15:15 LocalShop3 CityC User789 Tablet 0
105.1 28/01/2023 14:15 LocalShop3 CityA User123 Mobile 0
63.2 30/01/2023 12:30 LocalShop3 CityB User456 Desktop 1
117.4 01/02/2023 15:15 LocalShop3 CityC User789 Tablet 0
114.9 03/02/2023 14:15 LocalShop3 CityA User123 Mobile 0
103.2 05/02/2023 12:30 LocalShop3 CityB User456 Desktop 1
121.3 07/02/2023 15:15 LocalShop3 CityC User789 Tablet 0
73.9 09/02/2023 14:15 LocalShop3 CityA User123 Mobile 0
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
print("-- loading training data --")
# Load your training dataset (replace 'training_data.csv' with your actual training dataset)
training_data = pd.read_csv('fraud_training_data.csv')
# Remove all non-empty rows
training_data = training_data.dropna()
# convert column TransactionTime to datetime format
training_data['TransactionTime'] = pd.to_datetime(training_data['TransactionTime'], format="%m/%d/%Y %H:%M", errors='coerce')
# Assume your training dataset has the specified feature columns and a label column
features = ['TransactionAmount', 'MerchantInfo', 'LocationInfo', 'UserInfo', 'DeviceInfo']
label = 'Label'
# Split data into features and label
X = training_data[features]
y = training_data[label]
# Define columns to be one-hot encoded
categorical_features = ['MerchantInfo', 'LocationInfo', 'UserInfo', 'DeviceInfo']
# Create a column transformer
preprocessor = ColumnTransformer(
transformers=[
('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
],
remainder='passthrough'
)
# Create a pipeline with preprocessing and model
model = RandomForestClassifier()
pipeline = Pipeline([
('preprocessor', preprocessor),
('model', model)
])
# Train the model
pipeline.fit(X, y)
print("-- model trained --")
# Load your live data (replace 'live_data.csv' with the actual live data file)
print("-- loading live data --")
live_data = pd.read_csv('fraud_live_data.csv')
# Remove all non-empty rows
live_data = live_data.dropna()
# Convert 'TransactionTime' in live data to datetime format
training_data['TransactionTime'] = pd.to_datetime(training_data['TransactionTime'], format="%d/%m/%Y %I:%M:%S %p")
# Use only the specified features for live data
X_live = live_data[features]
# Make predictions on the live data
predictions = pipeline.predict(X_live)
# Add predictions to the live data
live_data['Prediction'] = ['Fraud' if pred == 1 else 'Not Fraud' for pred in predictions]
# Output the results
print("-- predictions made --")
print(live_data[['TransactionAmount', 'TransactionTime', 'MerchantInfo', 'LocationInfo', 'UserInfo', 'DeviceInfo', 'Prediction']])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment