This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Calculate the f-score | |
| fscore = (2 * precision * recall) / (precision + recall) | |
| # Find the optimal threshold | |
| index = np.argmax(fscore) | |
| thresholdOpt = round(thresholds[index], ndigits = 4) | |
| fscoreOpt = round(fscore[index], ndigits = 4) | |
| recallOpt = round(recall[index], ndigits = 4) | |
| precisionOpt = round(precision[index], ndigits = 4) | |
| print('Best Threshold: {} with F-Score: {}'.format(thresholdOpt, fscoreOpt)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Array for finding the optimal threshold | |
| thresholds = np.arange(0.0, 1.0, 0.0001) | |
| fscore = np.zeros(shape=(len(thresholds))) | |
| print('Length of sequence: {}'.format(len(thresholds))) | |
| # Fit the model | |
| for index, elem in enumerate(thresholds): | |
| # Corrected probabilities | |
| y_pred_prob = (y_pred > elem).astype('int') | |
| # Calculate the f-score |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Import module for data manipulation | |
| import pandas as pd | |
| # Import module for linear algebra | |
| import numpy as np | |
| # Import module for Fuzzy string matching | |
| from fuzzywuzzy import fuzz, process | |
| # Import module for regex | |
| import re | |
| # Import module for iteration | |
| import itertools |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Import module for data manipulation | |
| import pandas as pd | |
| # Import module for linear algebra | |
| import numpy as np | |
| # Import module for Fuzzy string matching | |
| from fuzzywuzzy import fuzz, process | |
| # Import module for binary search | |
| def stringMatching( | |
| df: pd.DataFrame, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Install pakcage for cluster ensemble | |
| install.packages('diceR') | |
| install.packages('treemapify') | |
| library(diceR) | |
| library(dplyr) | |
| library(ggplot2) | |
| library(treemapify) | |
| # Load the order data | |
| df_orders = read.csv(file = '../data/olist_orders_dataset.csv', header = TRUE, sep = ',') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Group customer by their customer segment | |
| rfm_level_agg = rfm %>% | |
| group_by(`Customer Segment`) %>% | |
| summarize( | |
| Recency = mean(Recency), | |
| Frequency = mean(Frequency), | |
| `Monetary Mean` = mean(Monetary), | |
| `Monetary Count` = n(), | |
| `Marketing Action` = unique(`Marketing Action`) | |
| ) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # ---------- Central Limit Theorem ---------- | |
| # Parameters | |
| sample_mean = 100000 | |
| sample_size = 20 | |
| set.seed(1234) | |
| # 1 Exponential Distribution | |
| x = rexp(n = 4000, rate = 0.1) | |
| hist(x) | |
| mean(x) |