Skip to content

Instantly share code, notes, and snippets.

View audhiaprilliant's full-sized avatar
🎯
Focusing

Audhi Aprilliant audhiaprilliant

🎯
Focusing
View GitHub Profile
@audhiaprilliant
audhiaprilliant / threshold_precision_recall_fscore.py
Created December 24, 2020 03:03
How to choose the optimal threshold for imbalanced classification
# Calculate the f-score
fscore = (2 * precision * recall) / (precision + recall)
# Find the optimal threshold
index = np.argmax(fscore)
thresholdOpt = round(thresholds[index], ndigits = 4)
fscoreOpt = round(fscore[index], ndigits = 4)
recallOpt = round(recall[index], ndigits = 4)
precisionOpt = round(precision[index], ndigits = 4)
print('Best Threshold: {} with F-Score: {}'.format(thresholdOpt, fscoreOpt))
@audhiaprilliant
audhiaprilliant / threshold_tuning.py
Created December 24, 2020 03:05
How to choose the optimal threshold for imbalanced classification
# Array for finding the optimal threshold
thresholds = np.arange(0.0, 1.0, 0.0001)
fscore = np.zeros(shape=(len(thresholds)))
print('Length of sequence: {}'.format(len(thresholds)))
# Fit the model
for index, elem in enumerate(thresholds):
# Corrected probabilities
y_pred_prob = (y_pred > elem).astype('int')
# Calculate the f-score
@audhiaprilliant
audhiaprilliant / kprototype.ipynb
Last active April 13, 2023 16:37
Clustering Algorithm for Mixed Data Type
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@audhiaprilliant
audhiaprilliant / fuzzy_optimization.py
Created February 12, 2021 12:18
Fuzzy String Matching
# Import module for data manipulation
import pandas as pd
# Import module for linear algebra
import numpy as np
# Import module for Fuzzy string matching
from fuzzywuzzy import fuzz, process
# Import module for regex
import re
# Import module for iteration
import itertools
@audhiaprilliant
audhiaprilliant / fuzzy_conventional.py
Last active February 12, 2021 12:51
Fuzzy String Matching
# Import module for data manipulation
import pandas as pd
# Import module for linear algebra
import numpy as np
# Import module for Fuzzy string matching
from fuzzywuzzy import fuzz, process
# Import module for binary search
def stringMatching(
df: pd.DataFrame,
@audhiaprilliant
audhiaprilliant / cluster_ensemble_data_manipulation.R
Last active March 7, 2021 08:11
Cluster Ensemble - Data Manipulation
# Install pakcage for cluster ensemble
install.packages('diceR')
install.packages('treemapify')
library(diceR)
library(dplyr)
library(ggplot2)
library(treemapify)
# Load the order data
df_orders = read.csv(file = '../data/olist_orders_dataset.csv', header = TRUE, sep = ',')
@audhiaprilliant
audhiaprilliant / cluster_ensemble_data_visualization.R
Created March 7, 2021 08:22
Cluster Ensemble - Data Visualization
# Group customer by their customer segment
rfm_level_agg = rfm %>%
group_by(`Customer Segment`) %>%
summarize(
Recency = mean(Recency),
Frequency = mean(Frequency),
`Monetary Mean` = mean(Monetary),
`Monetary Count` = n(),
`Marketing Action` = unique(`Marketing Action`)
)
@audhiaprilliant
audhiaprilliant / kmodes.ipynb
Last active July 2, 2023 23:49
Clustering Algorithm for Categorical Data Type
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# ---------- Central Limit Theorem ----------
# Parameters
sample_mean = 100000
sample_size = 20
set.seed(1234)
# 1 Exponential Distribution
x = rexp(n = 4000, rate = 0.1)
hist(x)
mean(x)