-
-
Save imjakedaniels/24fbb73db9efcc11fb3b97d3f858d063 to your computer and use it in GitHub Desktop.
```{r} | |
install.packages("tidyverse") | |
library(tidyverse) | |
library(stringr) | |
churn <- read_csv(file.choose()) | |
``` | |
```{r} | |
#take numerics and remove high-correlation | |
ch <- data.frame(churn[,c(2,5:21)]) | |
``` | |
```{r} | |
###categorical data to logical | |
ch$Int.l.Plan<- str_replace_all(ch$Int.l.Plan, "no", "F") | |
ch$Int.l.Plan<- str_replace_all(ch$Int.l.Plan, "yes", "T") | |
ch$VMail.Plan <- str_replace_all(ch$VMail.Plan, "no", "F") | |
ch$VMail.Plan <- str_replace_all(ch$VMail.Plan, "yes", "T") | |
ch$Churn. <- str_replace_all(ch$Churn., "False.", "F") | |
ch$Churn. <- str_replace_all(ch$Churn., "True.", "T") | |
#logicals | |
ch$Intl.Plan <- as.logical(ch$Int.l.Plan) | |
ch$VMail.Plan <- as.logical(ch$VMail.Plan) | |
ch$Churn <- as.logical(ch$Churn.) | |
#combine mins | |
ch$Local.Mins = NULL | |
ch$Local.Mins <- c(ch$Day.Mins + ch$Eve.Mins + ch$Night.Mins) | |
ch$Local.Charge = NULL | |
ch$Local.Charge <- c(ch$Day.Charge + ch$Eve.Charge + ch$Night.Charge) | |
#remove old mins | |
ch$Day.Mins = NULL | |
ch$Eve.Mins = NULL | |
ch$Night.Mins = NULL | |
#export for weka | |
install.packages("RWeka") | |
library(RWeka) | |
write.arff(ch, file = "clusteringresults.arff") | |
``` |
Upon Identifying our the three archetypes with the highest risk to Churn, I generated an email list to send call centers to offer incentives. | |
```{r} | |
install.packages("tidyverse") | |
library(tidyverse) | |
library(stringr) | |
#combine area code and phone numbers, then remove | |
ch$PhoneNumbers <- paste(ch$Area.Code, ch$Phone) | |
ch$Area.Code = NULL | |
ch$Phone = NULL | |
#Customer1 - Heavy Mins | |
heavy_users <- which(ch$Local.Charge > 71.54 & ch$VMail.Plan == F) | |
Customer1 <- ch[heavy_users,] | |
Customer1 <- Customer1$PhoneNumbers | |
#Customer2 - Moderate, Low-Contact with Intl Plans | |
moderate_international_users <- which(ch$Local.Charge <= 71.54 & ch$CustServ.Calls <= 3 & ch$Int.l.Plan == T & (ch$Intl.Calls <= 2 | ch$Intl.Mins > 13.1)) | |
Customer2 <- ch[moderate_international_users,] | |
Customer2 <- Customer2$PhoneNumbers | |
#Customer3 - Light, Frequent-Contact | |
which(ch$Local.Charge <= 71.54) | |
light_recurring <- which(ch$Local.Charge <= 54.12 & ch$CustServ.Calls > 3) | |
Customer3 <- ch[light_recurring,] | |
Customer3 <- Customer3$PhoneNumbers | |
``` |
More examples of the archetypes in cluster analysis, my responsibility in the project.
We performed K-Means Clustering with Euclidean Distance. With this data, we discovered which attributes we should investigate and created customer archetypes. On screen, we see two examples of the customers our decision tree revealed to us.
The Customer 1 Archetype, who are heavy users with no voicemail plan, and the Customer 3 Archetype, who are light users with many complaints.
When these clusters of customers with a high propensity to churn are exposed, we can improve our data collection surrounding them to reveal more attributes as to why that is in the future and adapt our current strategies to better handle sensitive customers like those with >3 customer service calls.
Pruned Decision Tree indicating strongest factors of churn and displaying the clusters of customers listed:
Customer 1 (Red): Heavy Users are likely to Churn when they don’t have voicemail plans. Is this a service our customers do want but can’t afford to upgrade their plan? Is there a lack of cross-selling when registering new customers? Should we build the voicemail plan into the plans as an included feature to discourage this Churn?
Customer 2 & 2a (Orange): Regular Users with Recurring Contact who lean towards the lighter side of overall usage are likely to Churn. We can deduct this is because their minimal usage of their phone has been unsatisfactory for them and this frustration could lead to cancellation for another competitor.
Customer 3 (Green): Regular Users with Low Contact have only have a 3% chance of Churning. The minor exception amongst these users, classified as 2B, with the International Plan. Those with the International plan who don’t make more than 2 calls a month, as well as those who have the plan and use it heavily, present the leading driver for churn in our most secure clientele. Are the heavy international users dropping calls? Are the users with 0 International Calls have the plan because it was bundled in a service, leading to a sense of redundancy when they are paying for a service they do not use.