Last active
April 15, 2019 16:22
-
-
Save imjakedaniels/24fbb73db9efcc11fb3b97d3f858d063 to your computer and use it in GitHub Desktop.
Churn Analysis Project with Clustering & Decision Trees
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
```{r} | |
install.packages("tidyverse") | |
library(tidyverse) | |
library(stringr) | |
churn <- read_csv(file.choose()) | |
``` | |
```{r} | |
#take numerics and remove high-correlation | |
ch <- data.frame(churn[,c(2,5:21)]) | |
``` | |
```{r} | |
###categorical data to logical | |
ch$Int.l.Plan<- str_replace_all(ch$Int.l.Plan, "no", "F") | |
ch$Int.l.Plan<- str_replace_all(ch$Int.l.Plan, "yes", "T") | |
ch$VMail.Plan <- str_replace_all(ch$VMail.Plan, "no", "F") | |
ch$VMail.Plan <- str_replace_all(ch$VMail.Plan, "yes", "T") | |
ch$Churn. <- str_replace_all(ch$Churn., "False.", "F") | |
ch$Churn. <- str_replace_all(ch$Churn., "True.", "T") | |
#logicals | |
ch$Intl.Plan <- as.logical(ch$Int.l.Plan) | |
ch$VMail.Plan <- as.logical(ch$VMail.Plan) | |
ch$Churn <- as.logical(ch$Churn.) | |
#combine mins | |
ch$Local.Mins = NULL | |
ch$Local.Mins <- c(ch$Day.Mins + ch$Eve.Mins + ch$Night.Mins) | |
ch$Local.Charge = NULL | |
ch$Local.Charge <- c(ch$Day.Charge + ch$Eve.Charge + ch$Night.Charge) | |
#remove old mins | |
ch$Day.Mins = NULL | |
ch$Eve.Mins = NULL | |
ch$Night.Mins = NULL | |
#export for weka | |
install.packages("RWeka") | |
library(RWeka) | |
write.arff(ch, file = "clusteringresults.arff") | |
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Upon Identifying our the three archetypes with the highest risk to Churn, I generated an email list to send call centers to offer incentives. | |
```{r} | |
install.packages("tidyverse") | |
library(tidyverse) | |
library(stringr) | |
#combine area code and phone numbers, then remove | |
ch$PhoneNumbers <- paste(ch$Area.Code, ch$Phone) | |
ch$Area.Code = NULL | |
ch$Phone = NULL | |
#Customer1 - Heavy Mins | |
heavy_users <- which(ch$Local.Charge > 71.54 & ch$VMail.Plan == F) | |
Customer1 <- ch[heavy_users,] | |
Customer1 <- Customer1$PhoneNumbers | |
#Customer2 - Moderate, Low-Contact with Intl Plans | |
moderate_international_users <- which(ch$Local.Charge <= 71.54 & ch$CustServ.Calls <= 3 & ch$Int.l.Plan == T & (ch$Intl.Calls <= 2 | ch$Intl.Mins > 13.1)) | |
Customer2 <- ch[moderate_international_users,] | |
Customer2 <- Customer2$PhoneNumbers | |
#Customer3 - Light, Frequent-Contact | |
which(ch$Local.Charge <= 71.54) | |
light_recurring <- which(ch$Local.Charge <= 54.12 & ch$CustServ.Calls > 3) | |
Customer3 <- ch[light_recurring,] | |
Customer3 <- Customer3$PhoneNumbers | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
More examples of the archetypes in cluster analysis, my responsibility in the project.

We performed K-Means Clustering with Euclidean Distance. With this data, we discovered which attributes we should investigate and created customer archetypes. On screen, we see two examples of the customers our decision tree revealed to us.
The Customer 1 Archetype, who are heavy users with no voicemail plan, and the Customer 3 Archetype, who are light users with many complaints.
When these clusters of customers with a high propensity to churn are exposed, we can improve our data collection surrounding them to reveal more attributes as to why that is in the future and adapt our current strategies to better handle sensitive customers like those with >3 customer service calls.